Overview

Brought to you by YData

Dataset statistics

Number of variables96
Number of observations584592
Missing cells16055219
Missing cells (%)28.6%
Total size in memory428.2 MiB
Average record size in memory768.0 B

Variable types

Text96

Dataset

DescriptionBirds NMNH Extant Specimen Records 0054887-241126133413365
URLhttps://doi.org/10.15468/dl.2en7ue

Alerts

license has constant value "CC0_1_0" Constant
publisher has constant value "National Museum of Natural History, Smithsonian Institution" Constant
institutionID has constant value "urn:lsid:biocol.org:col:34871" Constant
collectionID has constant value "urn:uuid:73d83e23-1999-42cd-b38a-c06a7d32d893" Constant
institutionCode has constant value "USNM" Constant
collectionCode has constant value "BIRDS" Constant
datasetName has constant value "NMNH Extant Biology" Constant
basisOfRecord has constant value "PRESERVED_SPECIMEN" Constant
occurrenceStatus has constant value "PRESENT" Constant
kingdom has constant value "Animalia" Constant
phylum has constant value "Chordata" Constant
class has constant value "Aves" Constant
datasetKey has constant value "821cc27a-e3bb-4bc5-ac34-89ada245069d" Constant
publishingCountry has constant value "US" Constant
kingdomKey has constant value "1" Constant
phylumKey has constant value "44" Constant
classKey has constant value "212" Constant
protocol has constant value "EML" Constant
lastCrawled has constant value "2024-12-02T11:48:23.416Z" Constant
publishedByGbifRegion has constant value "NORTH_AMERICA" Constant
recordNumber has 584474 (> 99.9%) missing values Missing
recordedBy has 7123 (1.2%) missing values Missing
sex has 112304 (19.2%) missing values Missing
lifeStage has 459507 (78.6%) missing values Missing
associatedSequences has 580105 (99.2%) missing values Missing
occurrenceRemarks has 572414 (97.9%) missing values Missing
eventDate has 41361 (7.1%) missing values Missing
startDayOfYear has 74069 (12.7%) missing values Missing
endDayOfYear has 74069 (12.7%) missing values Missing
year has 41376 (7.1%) missing values Missing
month has 53877 (9.2%) missing values Missing
day has 74434 (12.7%) missing values Missing
verbatimEventDate has 235442 (40.3%) missing values Missing
habitat has 567355 (97.1%) missing values Missing
continent has 27500 (4.7%) missing values Missing
waterBody has 558515 (95.5%) missing values Missing
stateProvince has 93871 (16.1%) missing values Missing
county has 353572 (60.5%) missing values Missing
locality has 107551 (18.4%) missing values Missing
verbatimElevation has 583323 (99.8%) missing values Missing
decimalLatitude has 556566 (95.2%) missing values Missing
decimalLongitude has 556566 (95.2%) missing values Missing
verbatimCoordinateSystem has 567281 (97.0%) missing values Missing
georeferenceProtocol has 583342 (99.8%) missing values Missing
identificationQualifier has 583894 (99.9%) missing values Missing
typeStatus has 580632 (99.3%) missing values Missing
identifiedBy has 581206 (99.4%) missing values Missing
specificEpithet has 7917 (1.4%) missing values Missing
infraspecificEpithet has 308675 (52.8%) missing values Missing
elevation has 498000 (85.2%) missing values Missing
elevationAccuracy has 574752 (98.3%) missing values Missing
distanceFromCentroidInMeters has 584584 (> 99.9%) missing values Missing
mediaType has 26095 (4.5%) missing values Missing
speciesKey has 7853 (1.3%) missing values Missing
species has 7853 (1.3%) missing values Missing
gbifRegion has 19462 (3.3%) missing values Missing
level0Gid has 562100 (96.2%) missing values Missing
level0Name has 562100 (96.2%) missing values Missing
level1Gid has 562129 (96.2%) missing values Missing
level1Name has 562129 (96.2%) missing values Missing
level2Gid has 562935 (96.3%) missing values Missing
level2Name has 563182 (96.3%) missing values Missing
level3Gid has 575359 (98.4%) missing values Missing
level3Name has 576369 (98.6%) missing values Missing
iucnRedListCategory has 273793 (46.8%) missing values Missing
gbifID has unique values Unique
occurrenceID has unique values Unique
catalogNumber has unique values Unique

Reproduction

Analysis started2025-01-08 22:54:32.986576
Analysis finished2025-01-08 22:54:56.133958
Duration23.15 seconds
Software versionydata-profiling vv4.12.1
Download configurationconfig.json

Variables

gbifID
Text

Unique 

Distinct584592
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:54:56.516456image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters5845920
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique584592 ?
Unique (%)100.0%

Sample

1st row4601228301
2nd row1317203661
3rd row1322538154
4th row1317205864
5th row1317207704
ValueCountFrequency (%)
4601228301 1
 
< 0.1%
1322540164 1
 
< 0.1%
1322550508 1
 
< 0.1%
1317268099 1
 
< 0.1%
1317208553 1
 
< 0.1%
1322538154 1
 
< 0.1%
1317205864 1
 
< 0.1%
1317207704 1
 
< 0.1%
1317208071 1
 
< 0.1%
1317232225 1
 
< 0.1%
Other values (584582) 584582
> 99.9%
2025-01-08T17:54:56.989528image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 1203296
20.6%
3 890257
15.2%
2 755814
12.9%
9 472136
 
8.1%
0 451322
 
7.7%
8 446725
 
7.6%
7 431175
 
7.4%
5 410207
 
7.0%
4 403288
 
6.9%
6 381700
 
6.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 5845920
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 1203296
20.6%
3 890257
15.2%
2 755814
12.9%
9 472136
 
8.1%
0 451322
 
7.7%
8 446725
 
7.6%
7 431175
 
7.4%
5 410207
 
7.0%
4 403288
 
6.9%
6 381700
 
6.5%

Most occurring scripts

ValueCountFrequency (%)
Common 5845920
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 1203296
20.6%
3 890257
15.2%
2 755814
12.9%
9 472136
 
8.1%
0 451322
 
7.7%
8 446725
 
7.6%
7 431175
 
7.4%
5 410207
 
7.0%
4 403288
 
6.9%
6 381700
 
6.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5845920
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 1203296
20.6%
3 890257
15.2%
2 755814
12.9%
9 472136
 
8.1%
0 451322
 
7.7%
8 446725
 
7.6%
7 431175
 
7.4%
5 410207
 
7.0%
4 403288
 
6.9%
6 381700
 
6.5%

license
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:54:57.040938image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters4092144
Distinct characters4
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCC0_1_0
2nd rowCC0_1_0
3rd rowCC0_1_0
4th rowCC0_1_0
5th rowCC0_1_0
ValueCountFrequency (%)
cc0_1_0 584592
100.0%
2025-01-08T17:54:57.129124image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 1169184
28.6%
0 1169184
28.6%
_ 1169184
28.6%
1 584592
14.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1753776
42.9%
Uppercase Letter 1169184
28.6%
Connector Punctuation 1169184
28.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1169184
66.7%
1 584592
33.3%
Uppercase Letter
ValueCountFrequency (%)
C 1169184
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1169184
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2922960
71.4%
Latin 1169184
 
28.6%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1169184
40.0%
_ 1169184
40.0%
1 584592
20.0%
Latin
ValueCountFrequency (%)
C 1169184
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4092144
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 1169184
28.6%
0 1169184
28.6%
_ 1169184
28.6%
1 584592
14.3%
Distinct11792
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:54:57.239071image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length20
Median length20
Mean length20
Min length20

Characters and Unicode

Total characters11691840
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4737 ?
Unique (%)0.8%

Sample

1st row2024-03-26T12:49:00Z
2nd row2022-07-12T14:29:00Z
3rd row2022-04-29T16:16:00Z
4th row2022-04-05T14:20:00Z
5th row2022-09-22T21:27:00Z
ValueCountFrequency (%)
2024-09-19t15:58:00z 8050
 
1.4%
2024-09-19t15:59:00z 7282
 
1.2%
2024-09-19t15:57:00z 6771
 
1.2%
2024-11-12t09:38:00z 6108
 
1.0%
2024-09-19t15:43:00z 3407
 
0.6%
2024-09-19t16:00:00z 2927
 
0.5%
2022-09-22t21:42:00z 2178
 
0.4%
2022-09-22t21:59:00z 2177
 
0.4%
2022-09-22t20:03:00z 2168
 
0.4%
2022-09-22t21:51:00z 2164
 
0.4%
Other values (11782) 541360
92.6%
2025-01-08T17:54:57.411711image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 2854206
24.4%
0 2776194
23.7%
- 1169184
10.0%
: 1169184
10.0%
1 800219
 
6.8%
T 584592
 
5.0%
Z 584592
 
5.0%
9 465374
 
4.0%
4 411253
 
3.5%
5 256891
 
2.2%
Other values (4) 620151
 
5.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8184288
70.0%
Dash Punctuation 1169184
 
10.0%
Other Punctuation 1169184
 
10.0%
Uppercase Letter 1169184
 
10.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 2854206
34.9%
0 2776194
33.9%
1 800219
 
9.8%
9 465374
 
5.7%
4 411253
 
5.0%
5 256891
 
3.1%
3 228479
 
2.8%
7 147850
 
1.8%
8 125685
 
1.5%
6 118137
 
1.4%
Uppercase Letter
ValueCountFrequency (%)
T 584592
50.0%
Z 584592
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 1169184
100.0%
Other Punctuation
ValueCountFrequency (%)
: 1169184
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 10522656
90.0%
Latin 1169184
 
10.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 2854206
27.1%
0 2776194
26.4%
- 1169184
11.1%
: 1169184
11.1%
1 800219
 
7.6%
9 465374
 
4.4%
4 411253
 
3.9%
5 256891
 
2.4%
3 228479
 
2.2%
7 147850
 
1.4%
Other values (2) 243822
 
2.3%
Latin
ValueCountFrequency (%)
T 584592
50.0%
Z 584592
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11691840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 2854206
24.4%
0 2776194
23.7%
- 1169184
10.0%
: 1169184
10.0%
1 800219
 
6.8%
T 584592
 
5.0%
Z 584592
 
5.0%
9 465374
 
4.0%
4 411253
 
3.5%
5 256891
 
2.2%
Other values (4) 620151
 
5.3%

publisher
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:54:57.479593image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length59
Median length59
Mean length59
Min length59

Characters and Unicode

Total characters34490928
Distinct characters21
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNational Museum of Natural History, Smithsonian Institution
2nd rowNational Museum of Natural History, Smithsonian Institution
3rd rowNational Museum of Natural History, Smithsonian Institution
4th rowNational Museum of Natural History, Smithsonian Institution
5th rowNational Museum of Natural History, Smithsonian Institution
ValueCountFrequency (%)
national 584592
14.3%
museum 584592
14.3%
of 584592
14.3%
natural 584592
14.3%
history 584592
14.3%
smithsonian 584592
14.3%
institution 584592
14.3%
2025-01-08T17:54:57.587023image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 4092144
11.9%
i 3507552
10.2%
3507552
10.2%
a 2922960
 
8.5%
o 2922960
 
8.5%
n 2922960
 
8.5%
s 2338368
 
6.8%
u 2338368
 
6.8%
r 1169184
 
3.4%
m 1169184
 
3.4%
Other values (11) 7599696
22.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 26891232
78.0%
Space Separator 3507552
 
10.2%
Uppercase Letter 3507552
 
10.2%
Other Punctuation 584592
 
1.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 4092144
15.2%
i 3507552
13.0%
a 2922960
10.9%
o 2922960
10.9%
n 2922960
10.9%
s 2338368
8.7%
u 2338368
8.7%
r 1169184
 
4.3%
m 1169184
 
4.3%
l 1169184
 
4.3%
Other values (4) 2338368
8.7%
Uppercase Letter
ValueCountFrequency (%)
N 1169184
33.3%
M 584592
16.7%
H 584592
16.7%
S 584592
16.7%
I 584592
16.7%
Space Separator
ValueCountFrequency (%)
3507552
100.0%
Other Punctuation
ValueCountFrequency (%)
, 584592
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 30398784
88.1%
Common 4092144
 
11.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 4092144
13.5%
i 3507552
11.5%
a 2922960
9.6%
o 2922960
9.6%
n 2922960
9.6%
s 2338368
 
7.7%
u 2338368
 
7.7%
r 1169184
 
3.8%
m 1169184
 
3.8%
N 1169184
 
3.8%
Other values (9) 5845920
19.2%
Common
ValueCountFrequency (%)
3507552
85.7%
, 584592
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 34490928
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 4092144
11.9%
i 3507552
10.2%
3507552
10.2%
a 2922960
 
8.5%
o 2922960
 
8.5%
n 2922960
 
8.5%
s 2338368
 
6.8%
u 2338368
 
6.8%
r 1169184
 
3.4%
m 1169184
 
3.4%
Other values (11) 7599696
22.0%

institutionID
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:54:57.637702image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length29
Median length29
Mean length29
Min length29

Characters and Unicode

Total characters16953168
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowurn:lsid:biocol.org:col:34871
2nd rowurn:lsid:biocol.org:col:34871
3rd rowurn:lsid:biocol.org:col:34871
4th rowurn:lsid:biocol.org:col:34871
5th rowurn:lsid:biocol.org:col:34871
ValueCountFrequency (%)
urn:lsid:biocol.org:col:34871 584592
100.0%
2025-01-08T17:54:57.736181image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 2338368
13.8%
: 2338368
13.8%
l 1753776
 
10.3%
i 1169184
 
6.9%
r 1169184
 
6.9%
c 1169184
 
6.9%
g 584592
 
3.4%
7 584592
 
3.4%
8 584592
 
3.4%
4 584592
 
3.4%
Other values (8) 4676736
27.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11107248
65.5%
Other Punctuation 2922960
 
17.2%
Decimal Number 2922960
 
17.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 2338368
21.1%
l 1753776
15.8%
i 1169184
10.5%
r 1169184
10.5%
c 1169184
10.5%
g 584592
 
5.3%
u 584592
 
5.3%
b 584592
 
5.3%
d 584592
 
5.3%
s 584592
 
5.3%
Decimal Number
ValueCountFrequency (%)
7 584592
20.0%
8 584592
20.0%
4 584592
20.0%
3 584592
20.0%
1 584592
20.0%
Other Punctuation
ValueCountFrequency (%)
: 2338368
80.0%
. 584592
 
20.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 11107248
65.5%
Common 5845920
34.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 2338368
21.1%
l 1753776
15.8%
i 1169184
10.5%
r 1169184
10.5%
c 1169184
10.5%
g 584592
 
5.3%
u 584592
 
5.3%
b 584592
 
5.3%
d 584592
 
5.3%
s 584592
 
5.3%
Common
ValueCountFrequency (%)
: 2338368
40.0%
7 584592
 
10.0%
8 584592
 
10.0%
4 584592
 
10.0%
3 584592
 
10.0%
. 584592
 
10.0%
1 584592
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 16953168
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 2338368
13.8%
: 2338368
13.8%
l 1753776
 
10.3%
i 1169184
 
6.9%
r 1169184
 
6.9%
c 1169184
 
6.9%
g 584592
 
3.4%
7 584592
 
3.4%
8 584592
 
3.4%
4 584592
 
3.4%
Other values (8) 4676736
27.6%

collectionID
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:54:57.789181image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length45
Median length45
Mean length45
Min length45

Characters and Unicode

Total characters26306640
Distinct characters20
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowurn:uuid:73d83e23-1999-42cd-b38a-c06a7d32d893
2nd rowurn:uuid:73d83e23-1999-42cd-b38a-c06a7d32d893
3rd rowurn:uuid:73d83e23-1999-42cd-b38a-c06a7d32d893
4th rowurn:uuid:73d83e23-1999-42cd-b38a-c06a7d32d893
5th rowurn:uuid:73d83e23-1999-42cd-b38a-c06a7d32d893
ValueCountFrequency (%)
urn:uuid:73d83e23-1999-42cd-b38a-c06a7d32d893 584592
100.0%
2025-01-08T17:54:57.888663image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 3507552
13.3%
d 2922960
11.1%
9 2338368
 
8.9%
- 2338368
 
8.9%
u 1753776
 
6.7%
8 1753776
 
6.7%
2 1753776
 
6.7%
7 1169184
 
4.4%
: 1169184
 
4.4%
c 1169184
 
4.4%
Other values (10) 6430512
24.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 12861024
48.9%
Lowercase Letter 9938064
37.8%
Dash Punctuation 2338368
 
8.9%
Other Punctuation 1169184
 
4.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 3507552
27.3%
9 2338368
18.2%
8 1753776
13.6%
2 1753776
13.6%
7 1169184
 
9.1%
1 584592
 
4.5%
4 584592
 
4.5%
0 584592
 
4.5%
6 584592
 
4.5%
Lowercase Letter
ValueCountFrequency (%)
d 2922960
29.4%
u 1753776
17.6%
c 1169184
 
11.8%
a 1169184
 
11.8%
i 584592
 
5.9%
e 584592
 
5.9%
r 584592
 
5.9%
n 584592
 
5.9%
b 584592
 
5.9%
Dash Punctuation
ValueCountFrequency (%)
- 2338368
100.0%
Other Punctuation
ValueCountFrequency (%)
: 1169184
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 16368576
62.2%
Latin 9938064
37.8%

Most frequent character per script

Common
ValueCountFrequency (%)
3 3507552
21.4%
9 2338368
14.3%
- 2338368
14.3%
8 1753776
10.7%
2 1753776
10.7%
7 1169184
 
7.1%
: 1169184
 
7.1%
1 584592
 
3.6%
4 584592
 
3.6%
0 584592
 
3.6%
Latin
ValueCountFrequency (%)
d 2922960
29.4%
u 1753776
17.6%
c 1169184
 
11.8%
a 1169184
 
11.8%
i 584592
 
5.9%
e 584592
 
5.9%
r 584592
 
5.9%
n 584592
 
5.9%
b 584592
 
5.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 26306640
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 3507552
13.3%
d 2922960
11.1%
9 2338368
 
8.9%
- 2338368
 
8.9%
u 1753776
 
6.7%
8 1753776
 
6.7%
2 1753776
 
6.7%
7 1169184
 
4.4%
: 1169184
 
4.4%
c 1169184
 
4.4%
Other values (10) 6430512
24.4%

institutionCode
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:54:57.926221image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters2338368
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUSNM
2nd rowUSNM
3rd rowUSNM
4th rowUSNM
5th rowUSNM
ValueCountFrequency (%)
usnm 584592
100.0%
2025-01-08T17:54:58.011092image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
U 584592
25.0%
S 584592
25.0%
N 584592
25.0%
M 584592
25.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2338368
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
U 584592
25.0%
S 584592
25.0%
N 584592
25.0%
M 584592
25.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2338368
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
U 584592
25.0%
S 584592
25.0%
N 584592
25.0%
M 584592
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2338368
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
U 584592
25.0%
S 584592
25.0%
N 584592
25.0%
M 584592
25.0%

collectionCode
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:54:58.050745image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters2922960
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBIRDS
2nd rowBIRDS
3rd rowBIRDS
4th rowBIRDS
5th rowBIRDS
ValueCountFrequency (%)
birds 584592
100.0%
2025-01-08T17:54:58.137870image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
B 584592
20.0%
I 584592
20.0%
R 584592
20.0%
D 584592
20.0%
S 584592
20.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2922960
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
B 584592
20.0%
I 584592
20.0%
R 584592
20.0%
D 584592
20.0%
S 584592
20.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2922960
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
B 584592
20.0%
I 584592
20.0%
R 584592
20.0%
D 584592
20.0%
S 584592
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2922960
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B 584592
20.0%
I 584592
20.0%
R 584592
20.0%
D 584592
20.0%
S 584592
20.0%

datasetName
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:54:58.176889image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters11107248
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNMNH Extant Biology
2nd rowNMNH Extant Biology
3rd rowNMNH Extant Biology
4th rowNMNH Extant Biology
5th rowNMNH Extant Biology
ValueCountFrequency (%)
nmnh 584592
33.3%
extant 584592
33.3%
biology 584592
33.3%
2025-01-08T17:54:58.266381image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
N 1169184
 
10.5%
1169184
 
10.5%
t 1169184
 
10.5%
o 1169184
 
10.5%
M 584592
 
5.3%
H 584592
 
5.3%
E 584592
 
5.3%
x 584592
 
5.3%
a 584592
 
5.3%
n 584592
 
5.3%
Other values (5) 2922960
26.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 6430512
57.9%
Uppercase Letter 3507552
31.6%
Space Separator 1169184
 
10.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 1169184
18.2%
o 1169184
18.2%
x 584592
9.1%
a 584592
9.1%
n 584592
9.1%
i 584592
9.1%
l 584592
9.1%
g 584592
9.1%
y 584592
9.1%
Uppercase Letter
ValueCountFrequency (%)
N 1169184
33.3%
M 584592
16.7%
H 584592
16.7%
E 584592
16.7%
B 584592
16.7%
Space Separator
ValueCountFrequency (%)
1169184
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9938064
89.5%
Common 1169184
 
10.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 1169184
11.8%
t 1169184
11.8%
o 1169184
11.8%
M 584592
 
5.9%
H 584592
 
5.9%
E 584592
 
5.9%
x 584592
 
5.9%
a 584592
 
5.9%
n 584592
 
5.9%
B 584592
 
5.9%
Other values (4) 2338368
23.5%
Common
ValueCountFrequency (%)
1169184
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11107248
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 1169184
 
10.5%
1169184
 
10.5%
t 1169184
 
10.5%
o 1169184
 
10.5%
M 584592
 
5.3%
H 584592
 
5.3%
E 584592
 
5.3%
x 584592
 
5.3%
a 584592
 
5.3%
n 584592
 
5.3%
Other values (5) 2922960
26.3%

basisOfRecord
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:54:58.315379image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length18
Median length18
Mean length18
Min length18

Characters and Unicode

Total characters10522656
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPRESERVED_SPECIMEN
2nd rowPRESERVED_SPECIMEN
3rd rowPRESERVED_SPECIMEN
4th rowPRESERVED_SPECIMEN
5th rowPRESERVED_SPECIMEN
ValueCountFrequency (%)
preserved_specimen 584592
100.0%
2025-01-08T17:54:58.411833image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E 2922960
27.8%
P 1169184
 
11.1%
R 1169184
 
11.1%
S 1169184
 
11.1%
V 584592
 
5.6%
D 584592
 
5.6%
_ 584592
 
5.6%
C 584592
 
5.6%
I 584592
 
5.6%
M 584592
 
5.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 9938064
94.4%
Connector Punctuation 584592
 
5.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 2922960
29.4%
P 1169184
 
11.8%
R 1169184
 
11.8%
S 1169184
 
11.8%
V 584592
 
5.9%
D 584592
 
5.9%
C 584592
 
5.9%
I 584592
 
5.9%
M 584592
 
5.9%
N 584592
 
5.9%
Connector Punctuation
ValueCountFrequency (%)
_ 584592
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9938064
94.4%
Common 584592
 
5.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 2922960
29.4%
P 1169184
 
11.8%
R 1169184
 
11.8%
S 1169184
 
11.8%
V 584592
 
5.9%
D 584592
 
5.9%
C 584592
 
5.9%
I 584592
 
5.9%
M 584592
 
5.9%
N 584592
 
5.9%
Common
ValueCountFrequency (%)
_ 584592
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10522656
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 2922960
27.8%
P 1169184
 
11.1%
R 1169184
 
11.1%
S 1169184
 
11.1%
V 584592
 
5.6%
D 584592
 
5.6%
_ 584592
 
5.6%
C 584592
 
5.6%
I 584592
 
5.6%
M 584592
 
5.6%

occurrenceID
Text

Unique 

Distinct584592
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:54:58.706552image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length63
Median length63
Mean length63
Min length63

Characters and Unicode

Total characters36829296
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique584592 ?
Unique (%)100.0%

Sample

1st rowhttp://n2t.net/ark:/65665/300075fa7-edd1-461a-9f08-e6ba501db28c
2nd rowhttp://n2t.net/ark:/65665/3000df15d-8cee-4e97-92ce-bb2a2eabd590
3rd rowhttp://n2t.net/ark:/65665/3ec08151f-42be-49b5-868b-d3deeddbd447
4th rowhttp://n2t.net/ark:/65665/30026d668-b659-45a3-8494-25f389913e98
5th rowhttp://n2t.net/ark:/65665/3003b6dd3-df37-400f-8ae6-e515ea9c2d04
ValueCountFrequency (%)
http://n2t.net/ark:/65665/300075fa7-edd1-461a-9f08-e6ba501db28c 1
 
< 0.1%
http://n2t.net/ark:/65665/3ec1dbc05-3709-4356-a820-34fb80d5a314 1
 
< 0.1%
http://n2t.net/ark:/65665/3ec937490-e545-4db6-812d-bbcfe6057996 1
 
< 0.1%
http://n2t.net/ark:/65665/302e7b9b3-e03c-4d08-a4a5-3110143884c6 1
 
< 0.1%
http://n2t.net/ark:/65665/3004420cd-5dd8-4d0b-bb81-5df504988ccf 1
 
< 0.1%
http://n2t.net/ark:/65665/3ec08151f-42be-49b5-868b-d3deeddbd447 1
 
< 0.1%
http://n2t.net/ark:/65665/30026d668-b659-45a3-8494-25f389913e98 1
 
< 0.1%
http://n2t.net/ark:/65665/3003b6dd3-df37-400f-8ae6-e515ea9c2d04 1
 
< 0.1%
http://n2t.net/ark:/65665/3003f1ccb-ef9c-4862-9369-5c82ac27e83e 1
 
< 0.1%
http://n2t.net/ark:/65665/30150f58d-26d0-475b-b905-a9bb8e072667 1
 
< 0.1%
Other values (584582) 584582
> 99.9%
2025-01-08T17:54:59.057278image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
/ 2922960
 
7.9%
6 2852045
 
7.7%
- 2338368
 
6.3%
t 2338368
 
6.3%
5 2265649
 
6.2%
a 1827322
 
5.0%
2 1681511
 
4.6%
3 1680550
 
4.6%
e 1680227
 
4.6%
4 1679992
 
4.6%
Other values (16) 15562304
42.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 15931516
43.3%
Lowercase Letter 13882676
37.7%
Other Punctuation 4676736
 
12.7%
Dash Punctuation 2338368
 
6.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 2338368
16.8%
a 1827322
13.2%
e 1680227
12.1%
b 1241614
8.9%
n 1169184
8.4%
c 1097145
7.9%
f 1096769
7.9%
d 1093679
7.9%
k 584592
 
4.2%
r 584592
 
4.2%
Other values (2) 1169184
8.4%
Decimal Number
ValueCountFrequency (%)
6 2852045
17.9%
5 2265649
14.2%
2 1681511
10.6%
3 1680550
10.5%
4 1679992
10.5%
8 1243182
7.8%
9 1240698
7.8%
1 1096282
 
6.9%
7 1096019
 
6.9%
0 1095588
 
6.9%
Other Punctuation
ValueCountFrequency (%)
/ 2922960
62.5%
: 1169184
 
25.0%
. 584592
 
12.5%
Dash Punctuation
ValueCountFrequency (%)
- 2338368
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 22946620
62.3%
Latin 13882676
37.7%

Most frequent character per script

Common
ValueCountFrequency (%)
/ 2922960
12.7%
6 2852045
12.4%
- 2338368
10.2%
5 2265649
9.9%
2 1681511
7.3%
3 1680550
7.3%
4 1679992
7.3%
8 1243182
 
5.4%
9 1240698
 
5.4%
: 1169184
 
5.1%
Other values (4) 3872481
16.9%
Latin
ValueCountFrequency (%)
t 2338368
16.8%
a 1827322
13.2%
e 1680227
12.1%
b 1241614
8.9%
n 1169184
8.4%
c 1097145
7.9%
f 1096769
7.9%
d 1093679
7.9%
k 584592
 
4.2%
r 584592
 
4.2%
Other values (2) 1169184
8.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 36829296
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/ 2922960
 
7.9%
6 2852045
 
7.7%
- 2338368
 
6.3%
t 2338368
 
6.3%
5 2265649
 
6.2%
a 1827322
 
5.0%
2 1681511
 
4.6%
3 1680550
 
4.6%
e 1680227
 
4.6%
4 1679992
 
4.6%
Other values (16) 15562304
42.3%

catalogNumber
Text

Unique 

Distinct584592
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:54:59.488688image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length11
Median length11
Mean length10.92067972
Min length6

Characters and Unicode

Total characters6384142
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique584592 ?
Unique (%)100.0%

Sample

1st rowUSNM A16396
2nd rowUSNM 101402
3rd rowUSNM B28085
4th rowUSNM 289875
5th rowUSNM 562118
ValueCountFrequency (%)
usnm 584592
50.0%
438818 1
 
< 0.1%
160226 1
 
< 0.1%
540920 1
 
< 0.1%
400497 1
 
< 0.1%
b28085 1
 
< 0.1%
289875 1
 
< 0.1%
562118 1
 
< 0.1%
b42715 1
 
< 0.1%
378552 1
 
< 0.1%
Other values (584583) 584583
50.0%
2025-01-08T17:54:59.976125image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
U 584592
 
9.2%
S 584592
 
9.2%
N 584592
 
9.2%
M 584592
 
9.2%
584592
 
9.2%
3 396623
 
6.2%
4 396155
 
6.2%
5 388165
 
6.1%
1 387443
 
6.1%
2 382727
 
6.0%
Other values (7) 1510069
23.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3420292
53.6%
Uppercase Letter 2379258
37.3%
Space Separator 584592
 
9.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 396623
11.6%
4 396155
11.6%
5 388165
11.3%
1 387443
11.3%
2 382727
11.2%
6 326899
9.6%
0 287859
8.4%
9 286088
8.4%
8 284189
8.3%
7 284144
8.3%
Uppercase Letter
ValueCountFrequency (%)
U 584592
24.6%
S 584592
24.6%
N 584592
24.6%
M 584592
24.6%
B 34602
 
1.5%
A 6288
 
0.3%
Space Separator
ValueCountFrequency (%)
584592
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4004884
62.7%
Latin 2379258
37.3%

Most frequent character per script

Common
ValueCountFrequency (%)
584592
14.6%
3 396623
9.9%
4 396155
9.9%
5 388165
9.7%
1 387443
9.7%
2 382727
9.6%
6 326899
8.2%
0 287859
7.2%
9 286088
7.1%
8 284189
7.1%
Latin
ValueCountFrequency (%)
U 584592
24.6%
S 584592
24.6%
N 584592
24.6%
M 584592
24.6%
B 34602
 
1.5%
A 6288
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6384142
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
U 584592
 
9.2%
S 584592
 
9.2%
N 584592
 
9.2%
M 584592
 
9.2%
584592
 
9.2%
3 396623
 
6.2%
4 396155
 
6.2%
5 388165
 
6.1%
1 387443
 
6.1%
2 382727
 
6.0%
Other values (7) 1510069
23.7%

recordNumber
Text

Missing 

Distinct4
Distinct (%)3.4%
Missing584474
Missing (%)> 99.9%
Memory size4.5 MiB
2025-01-08T17:55:00.035712image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length5
Median length1
Mean length1.059322034
Min length1

Characters and Unicode

Total characters125
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)2.5%

Sample

1st rowl
2nd rowl
3rd rowdu
4th rowl
5th rowl
ValueCountFrequency (%)
l 115
97.5%
du 1
 
0.8%
riley 1
 
0.8%
sta 1
 
0.8%
2025-01-08T17:55:00.136202image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
l 116
92.8%
d 1
 
0.8%
u 1
 
0.8%
r 1
 
0.8%
i 1
 
0.8%
e 1
 
0.8%
y 1
 
0.8%
s 1
 
0.8%
t 1
 
0.8%
a 1
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 125
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 116
92.8%
d 1
 
0.8%
u 1
 
0.8%
r 1
 
0.8%
i 1
 
0.8%
e 1
 
0.8%
y 1
 
0.8%
s 1
 
0.8%
t 1
 
0.8%
a 1
 
0.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 125
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 116
92.8%
d 1
 
0.8%
u 1
 
0.8%
r 1
 
0.8%
i 1
 
0.8%
e 1
 
0.8%
y 1
 
0.8%
s 1
 
0.8%
t 1
 
0.8%
a 1
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 125
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l 116
92.8%
d 1
 
0.8%
u 1
 
0.8%
r 1
 
0.8%
i 1
 
0.8%
e 1
 
0.8%
y 1
 
0.8%
s 1
 
0.8%
t 1
 
0.8%
a 1
 
0.8%

recordedBy
Text

Missing 

Distinct13250
Distinct (%)2.3%
Missing7123
Missing (%)1.2%
Memory size4.5 MiB
2025-01-08T17:55:00.304010image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length60
Median length55
Mean length11.76426613
Min length1

Characters and Unicode

Total characters6793499
Distinct characters65
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6170 ?
Unique (%)1.1%

Sample

1st rowT. Page
2nd rowC. Worthen
3rd rowH. Lee
4th rowC. Sperry
5th rowC. Ross
ValueCountFrequency (%)
a 64567
 
4.8%
j 60293
 
4.5%
e 58464
 
4.4%
56508
 
4.2%
w 52970
 
4.0%
h 41937
 
3.1%
m 37812
 
2.8%
c 37330
 
2.8%
t 32505
 
2.4%
wetmore 32367
 
2.4%
Other values (7402) 863863
64.5%
2025-01-08T17:55:00.554577image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
761147
 
11.2%
. 558992
 
8.2%
e 547336
 
8.1%
r 485535
 
7.1%
o 389498
 
5.7%
n 353948
 
5.2%
a 303496
 
4.5%
l 299899
 
4.4%
i 264364
 
3.9%
t 245352
 
3.6%
Other values (55) 2583932
38.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4115057
60.6%
Uppercase Letter 1287488
 
19.0%
Space Separator 761147
 
11.2%
Other Punctuation 622658
 
9.2%
Dash Punctuation 3521
 
0.1%
Decimal Number 2824
 
< 0.1%
Open Punctuation 402
 
< 0.1%
Close Punctuation 402
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 547336
13.3%
r 485535
11.8%
o 389498
9.5%
n 353948
 
8.6%
a 303496
 
7.4%
l 299899
 
7.3%
i 264364
 
6.4%
t 245352
 
6.0%
s 161797
 
3.9%
c 132938
 
3.2%
Other values (16) 930894
22.6%
Uppercase Letter
ValueCountFrequency (%)
W 117570
 
9.1%
C 117514
 
9.1%
B 99641
 
7.7%
A 96140
 
7.5%
M 90141
 
7.0%
H 82765
 
6.4%
R 77734
 
6.0%
P 76833
 
6.0%
J 69213
 
5.4%
S 67116
 
5.2%
Other values (16) 392821
30.5%
Other Punctuation
ValueCountFrequency (%)
. 558992
89.8%
& 56427
 
9.1%
, 6619
 
1.1%
' 606
 
0.1%
? 13
 
< 0.1%
/ 1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
9 1412
50.0%
1 708
25.1%
8 704
24.9%
Space Separator
ValueCountFrequency (%)
761147
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3521
100.0%
Open Punctuation
ValueCountFrequency (%)
( 402
100.0%
Close Punctuation
ValueCountFrequency (%)
) 402
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5402545
79.5%
Common 1390954
 
20.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 547336
 
10.1%
r 485535
 
9.0%
o 389498
 
7.2%
n 353948
 
6.6%
a 303496
 
5.6%
l 299899
 
5.6%
i 264364
 
4.9%
t 245352
 
4.5%
s 161797
 
3.0%
c 132938
 
2.5%
Other values (42) 2218382
41.1%
Common
ValueCountFrequency (%)
761147
54.7%
. 558992
40.2%
& 56427
 
4.1%
, 6619
 
0.5%
- 3521
 
0.3%
9 1412
 
0.1%
1 708
 
0.1%
8 704
 
0.1%
' 606
 
< 0.1%
( 402
 
< 0.1%
Other values (3) 416
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6793499
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
761147
 
11.2%
. 558992
 
8.2%
e 547336
 
8.1%
r 485535
 
7.1%
o 389498
 
5.7%
n 353948
 
5.2%
a 303496
 
4.5%
l 299899
 
4.4%
i 264364
 
3.9%
t 245352
 
3.6%
Other values (55) 2583932
38.0%
Distinct49
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:00.612577image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length3
Median length1
Mean length1.001168336
Min length1

Characters and Unicode

Total characters585275
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)< 0.1%

Sample

1st row1
2nd row1
3rd row4
4th row1
5th row1
ValueCountFrequency (%)
1 558309
95.5%
2 6799
 
1.2%
4 6794
 
1.2%
3 6435
 
1.1%
5 3136
 
0.5%
6 1204
 
0.2%
7 608
 
0.1%
8 374
 
0.1%
9 251
 
< 0.1%
10 169
 
< 0.1%
Other values (39) 513
 
0.1%
2025-01-08T17:55:00.716735image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 559052
95.5%
2 6944
 
1.2%
4 6855
 
1.2%
3 6513
 
1.1%
5 3191
 
0.5%
6 1239
 
0.2%
7 631
 
0.1%
8 397
 
0.1%
9 270
 
< 0.1%
0 183
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 585275
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 559052
95.5%
2 6944
 
1.2%
4 6855
 
1.2%
3 6513
 
1.1%
5 3191
 
0.5%
6 1239
 
0.2%
7 631
 
0.1%
8 397
 
0.1%
9 270
 
< 0.1%
0 183
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 585275
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 559052
95.5%
2 6944
 
1.2%
4 6855
 
1.2%
3 6513
 
1.1%
5 3191
 
0.5%
6 1239
 
0.2%
7 631
 
0.1%
8 397
 
0.1%
9 270
 
< 0.1%
0 183
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 585275
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 559052
95.5%
2 6944
 
1.2%
4 6855
 
1.2%
3 6513
 
1.1%
5 3191
 
0.5%
6 1239
 
0.2%
7 631
 
0.1%
8 397
 
0.1%
9 270
 
< 0.1%
0 183
 
< 0.1%

sex
Text

Missing 

Distinct2
Distinct (%)< 0.1%
Missing112304
Missing (%)19.2%
Memory size4.5 MiB
2025-01-08T17:55:00.756017image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length6
Median length4
Mean length4.817674809
Min length4

Characters and Unicode

Total characters2275330
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMALE
2nd rowFEMALE
3rd rowMALE
4th rowMALE
5th rowMALE
ValueCountFrequency (%)
male 279199
59.1%
female 193089
40.9%
2025-01-08T17:55:00.853025image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E 665377
29.2%
M 472288
20.8%
A 472288
20.8%
L 472288
20.8%
F 193089
 
8.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2275330
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 665377
29.2%
M 472288
20.8%
A 472288
20.8%
L 472288
20.8%
F 193089
 
8.5%

Most occurring scripts

ValueCountFrequency (%)
Latin 2275330
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 665377
29.2%
M 472288
20.8%
A 472288
20.8%
L 472288
20.8%
F 193089
 
8.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2275330
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 665377
29.2%
M 472288
20.8%
A 472288
20.8%
L 472288
20.8%
F 193089
 
8.5%

lifeStage
Text

Missing 

Distinct7
Distinct (%)< 0.1%
Missing459507
Missing (%)78.6%
Memory size4.5 MiB
2025-01-08T17:55:00.896025image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length8
Median length5
Mean length5.961034497
Min length5

Characters and Unicode

Total characters745636
Distinct characters26
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowImmature
2nd rowJuvenile
3rd rowAdult
4th rowAdult
5th rowAdult
ValueCountFrequency (%)
adult 81111
64.8%
immature 27828
 
22.2%
juvenile 10762
 
8.6%
chick 3709
 
3.0%
subadult 1382
 
1.1%
embryo 292
 
0.2%
nestling 1
 
< 0.1%
2025-01-08T17:55:00.993686image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
u 122465
16.4%
t 110322
14.8%
l 93256
12.5%
d 82493
11.1%
A 81111
10.9%
m 55948
7.5%
e 49353
6.6%
a 29210
 
3.9%
r 28120
 
3.8%
I 27828
 
3.7%
Other values (16) 65530
8.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 620551
83.2%
Uppercase Letter 125085
 
16.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
u 122465
19.7%
t 110322
17.8%
l 93256
15.0%
d 82493
13.3%
m 55948
9.0%
e 49353
8.0%
a 29210
 
4.7%
r 28120
 
4.5%
i 14472
 
2.3%
n 10763
 
1.7%
Other values (9) 24149
 
3.9%
Uppercase Letter
ValueCountFrequency (%)
A 81111
64.8%
I 27828
 
22.2%
J 10762
 
8.6%
C 3709
 
3.0%
S 1382
 
1.1%
E 292
 
0.2%
N 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 745636
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
u 122465
16.4%
t 110322
14.8%
l 93256
12.5%
d 82493
11.1%
A 81111
10.9%
m 55948
7.5%
e 49353
6.6%
a 29210
 
3.9%
r 28120
 
3.8%
I 27828
 
3.7%
Other values (16) 65530
8.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 745636
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
u 122465
16.4%
t 110322
14.8%
l 93256
12.5%
d 82493
11.1%
A 81111
10.9%
m 55948
7.5%
e 49353
6.6%
a 29210
 
3.9%
r 28120
 
3.8%
I 27828
 
3.7%
Other values (16) 65530
8.8%

occurrenceStatus
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:01.034827image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters4092144
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPRESENT
2nd rowPRESENT
3rd rowPRESENT
4th rowPRESENT
5th rowPRESENT
ValueCountFrequency (%)
present 584592
100.0%
2025-01-08T17:55:01.121171image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E 1169184
28.6%
P 584592
14.3%
R 584592
14.3%
S 584592
14.3%
N 584592
14.3%
T 584592
14.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 4092144
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 1169184
28.6%
P 584592
14.3%
R 584592
14.3%
S 584592
14.3%
N 584592
14.3%
T 584592
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 4092144
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 1169184
28.6%
P 584592
14.3%
R 584592
14.3%
S 584592
14.3%
N 584592
14.3%
T 584592
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4092144
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 1169184
28.6%
P 584592
14.3%
R 584592
14.3%
S 584592
14.3%
N 584592
14.3%
T 584592
14.3%
Distinct132
Distinct (%)< 0.1%
Missing6
Missing (%)< 0.1%
Memory size4.5 MiB
2025-01-08T17:55:01.169676image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length76
Median length11
Mean length11.71096126
Min length6

Characters and Unicode

Total characters6846064
Distinct characters31
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique39 ?
Unique (%)< 0.1%

Sample

1st rowSkin: Whole
2nd rowSkin: Whole
3rd rowEgg(s)
4th rowSkeleton: Whole
5th rowSkeleton: Whole
ValueCountFrequency (%)
whole 535339
45.8%
skin 470355
40.2%
skeleton 58626
 
5.0%
egg(s 33064
 
2.8%
fluid 32579
 
2.8%
partial 24616
 
2.1%
nest(s 4794
 
0.4%
feather(s 4784
 
0.4%
mounted 1952
 
0.2%
clutch 967
 
0.1%
Other values (7) 2530
 
0.2%
2025-01-08T17:55:01.284611image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 671417
9.8%
l 654016
9.6%
o 595917
8.7%
585020
8.5%
: 562892
8.2%
h 541090
7.9%
W 535338
7.8%
n 532123
7.8%
i 529352
7.7%
S 529335
7.7%
Other values (21) 1109564
16.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4423169
64.6%
Uppercase Letter 1169248
 
17.1%
Space Separator 585020
 
8.5%
Other Punctuation 583343
 
8.5%
Open Punctuation 42642
 
0.6%
Close Punctuation 42642
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 671417
15.2%
l 654016
14.8%
o 595917
13.5%
h 541090
12.2%
n 532123
12.0%
i 529352
12.0%
k 528981
12.0%
t 96879
 
2.2%
g 66128
 
1.5%
a 55099
 
1.2%
Other values (8) 152167
 
3.4%
Uppercase Letter
ValueCountFrequency (%)
W 535338
45.8%
S 529335
45.3%
F 37363
 
3.2%
E 33064
 
2.8%
P 24615
 
2.1%
N 4794
 
0.4%
M 3399
 
0.3%
C 1340
 
0.1%
Other Punctuation
ValueCountFrequency (%)
: 562892
96.5%
; 20451
 
3.5%
Space Separator
ValueCountFrequency (%)
585020
100.0%
Open Punctuation
ValueCountFrequency (%)
( 42642
100.0%
Close Punctuation
ValueCountFrequency (%)
) 42642
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5592417
81.7%
Common 1253647
 
18.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 671417
12.0%
l 654016
11.7%
o 595917
10.7%
h 541090
9.7%
W 535338
9.6%
n 532123
9.5%
i 529352
9.5%
S 529335
9.5%
k 528981
9.5%
t 96879
 
1.7%
Other values (16) 377969
6.8%
Common
ValueCountFrequency (%)
585020
46.7%
: 562892
44.9%
( 42642
 
3.4%
) 42642
 
3.4%
; 20451
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6846064
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 671417
9.8%
l 654016
9.6%
o 595917
8.7%
585020
8.5%
: 562892
8.2%
h 541090
7.9%
W 535338
7.8%
n 532123
7.8%
i 529352
7.7%
S 529335
7.7%
Other values (21) 1109564
16.2%

associatedSequences
Text

Missing 

Distinct4430
Distinct (%)98.7%
Missing580105
Missing (%)99.2%
Memory size4.5 MiB
2025-01-08T17:55:01.347500image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length12558
Median length49
Mean length129.0780031
Min length49

Characters and Unicode

Total characters579173
Distinct characters63
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4421 ?
Unique (%)98.5%

Sample

1st rowhttps://www.ncbi.nlm.nih.gov/gquery?term=KM080095
2nd rowhttps://www.ncbi.nlm.nih.gov/gquery?term=JQ176229
3rd rowhttps://www.ncbi.nlm.nih.gov/gquery?term=JQ173910
4th rowhttps://www.ncbi.nlm.nih.gov/gquery?term=KU722483
5th rowhttps://www.ncbi.nlm.nih.gov/gquery?term=FJ547617;https://www.ncbi.nlm.nih.gov/gquery?term=FJ547732;https://www.ncbi.nlm.nih.gov/gquery?term=FJ547781;https://www.ncbi.nlm.nih.gov/gquery?term=FJ547782
ValueCountFrequency (%)
https://www.ncbi.nlm.nih.gov/gquery?term=prjna521985 34
 
0.8%
https://www.ncbi.nlm.nih.gov/gquery?term=ay273835 10
 
0.2%
https://www.ncbi.nlm.nih.gov/gquery?term=ay273864 8
 
0.2%
https://www.ncbi.nlm.nih.gov/gquery?term=ay273832 3
 
0.1%
https://www.ncbi.nlm.nih.gov/gquery?term=fj207364 3
 
0.1%
https://www.ncbi.nlm.nih.gov/gquery?term=fj207374 2
 
< 0.1%
https://www.ncbi.nlm.nih.gov/gquery?term=mh778417 2
 
< 0.1%
https://www.ncbi.nlm.nih.gov/gquery?term=dq433197 2
 
< 0.1%
https://www.ncbi.nlm.nih.gov/gquery?term=fj207379 2
 
< 0.1%
https://www.ncbi.nlm.nih.gov/gquery?term=mt456681 1
 
< 0.1%
Other values (4420) 4420
98.5%
2025-01-08T17:55:01.600823image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 46361
 
8.0%
/ 34770
 
6.0%
w 34770
 
6.0%
n 34770
 
6.0%
t 34770
 
6.0%
h 23180
 
4.0%
r 23180
 
4.0%
e 23180
 
4.0%
i 23180
 
4.0%
m 23180
 
4.0%
Other values (53) 277832
48.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 359290
62.0%
Other Punctuation 111414
 
19.2%
Decimal Number 71114
 
12.3%
Uppercase Letter 25344
 
4.4%
Math Symbol 11590
 
2.0%
Dash Punctuation 420
 
0.1%
Connector Punctuation 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
K 4122
16.3%
J 3684
14.5%
Q 3175
12.5%
U 2477
9.8%
E 1468
 
5.8%
R 1383
 
5.5%
M 1361
 
5.4%
F 1128
 
4.5%
N 849
 
3.3%
S 753
 
3.0%
Other values (16) 4944
19.5%
Lowercase Letter
ValueCountFrequency (%)
w 34770
 
9.7%
n 34770
 
9.7%
t 34770
 
9.7%
h 23180
 
6.5%
r 23180
 
6.5%
e 23180
 
6.5%
i 23180
 
6.5%
m 23180
 
6.5%
g 23180
 
6.5%
q 11590
 
3.2%
Other values (9) 104310
29.0%
Decimal Number
ValueCountFrequency (%)
7 9784
13.8%
1 8422
11.8%
2 7230
10.2%
5 7006
9.9%
4 6944
9.8%
9 6757
9.5%
0 6595
9.3%
3 6298
8.9%
8 6093
8.6%
6 5985
8.4%
Other Punctuation
ValueCountFrequency (%)
. 46361
41.6%
/ 34770
31.2%
? 11590
 
10.4%
: 11590
 
10.4%
; 7103
 
6.4%
Math Symbol
ValueCountFrequency (%)
= 11590
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 420
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 384634
66.4%
Common 194539
33.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
w 34770
 
9.0%
n 34770
 
9.0%
t 34770
 
9.0%
h 23180
 
6.0%
r 23180
 
6.0%
e 23180
 
6.0%
i 23180
 
6.0%
m 23180
 
6.0%
g 23180
 
6.0%
q 11590
 
3.0%
Other values (35) 129654
33.7%
Common
ValueCountFrequency (%)
. 46361
23.8%
/ 34770
17.9%
= 11590
 
6.0%
? 11590
 
6.0%
: 11590
 
6.0%
7 9784
 
5.0%
1 8422
 
4.3%
2 7230
 
3.7%
; 7103
 
3.7%
5 7006
 
3.6%
Other values (8) 39093
20.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 579173
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 46361
 
8.0%
/ 34770
 
6.0%
w 34770
 
6.0%
n 34770
 
6.0%
t 34770
 
6.0%
h 23180
 
4.0%
r 23180
 
4.0%
e 23180
 
4.0%
i 23180
 
4.0%
m 23180
 
4.0%
Other values (53) 277832
48.0%

occurrenceRemarks
Text

Missing 

Distinct7341
Distinct (%)60.3%
Missing572414
Missing (%)97.9%
Memory size4.5 MiB
2025-01-08T17:55:01.771126image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length6354
Median length555
Mean length50.68484152
Min length1

Characters and Unicode

Total characters617240
Distinct characters102
Distinct categories15 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6370 ?
Unique (%)52.3%

Sample

1st rowcarcass saved
2nd rowfertile
3rd rowA second soft part color is listed, but it is in French. It needs translated; the handwriting is somewhat smushed and hard to read. Appears to be "Patte et tour des yeux carminis." [Feet and eye ring carmine?]
4th rowbreeding
5th rowW.P. Taylor
ValueCountFrequency (%)
of 4593
 
4.4%
in 2349
 
2.2%
as 2209
 
2.1%
the 2118
 
2.0%
usnm 2055
 
2.0%
tag 1748
 
1.7%
specimens 1534
 
1.5%
cataloged 1516
 
1.4%
1422
 
1.4%
originally 1393
 
1.3%
Other values (10725) 84151
80.1%
2025-01-08T17:55:02.017563image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
92912
15.1%
e 51486
 
8.3%
a 37742
 
6.1%
n 34944
 
5.7%
o 33997
 
5.5%
i 32379
 
5.2%
t 32167
 
5.2%
s 26495
 
4.3%
r 25801
 
4.2%
l 22827
 
3.7%
Other values (92) 226490
36.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 429839
69.6%
Space Separator 92912
 
15.1%
Uppercase Letter 38725
 
6.3%
Decimal Number 34189
 
5.5%
Other Punctuation 18120
 
2.9%
Dash Punctuation 1707
 
0.3%
Open Punctuation 745
 
0.1%
Close Punctuation 743
 
0.1%
Math Symbol 219
 
< 0.1%
Final Punctuation 12
 
< 0.1%
Other values (5) 29
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 51486
12.0%
a 37742
 
8.8%
n 34944
 
8.1%
o 33997
 
7.9%
i 32379
 
7.5%
t 32167
 
7.5%
s 26495
 
6.2%
r 25801
 
6.0%
l 22827
 
5.3%
d 18987
 
4.4%
Other values (20) 113014
26.3%
Uppercase Letter
ValueCountFrequency (%)
S 4637
12.0%
N 4058
 
10.5%
M 3832
 
9.9%
U 3571
 
9.2%
C 3039
 
7.8%
O 2136
 
5.5%
A 1965
 
5.1%
T 1912
 
4.9%
B 1512
 
3.9%
F 1474
 
3.8%
Other values (16) 10589
27.3%
Other Punctuation
ValueCountFrequency (%)
. 7077
39.1%
, 3513
19.4%
: 1896
 
10.5%
; 1812
 
10.0%
" 1381
 
7.6%
# 887
 
4.9%
/ 499
 
2.8%
' 346
 
1.9%
& 321
 
1.8%
? 158
 
0.9%
Other values (5) 230
 
1.3%
Decimal Number
ValueCountFrequency (%)
1 5919
17.3%
2 4779
14.0%
0 3893
11.4%
5 3430
10.0%
3 3030
8.9%
6 3028
8.9%
4 2960
8.7%
9 2714
7.9%
8 2382
7.0%
7 2054
 
6.0%
Math Symbol
ValueCountFrequency (%)
+ 92
42.0%
= 68
31.1%
> 34
 
15.5%
< 14
 
6.4%
± 9
 
4.1%
~ 2
 
0.9%
Dash Punctuation
ValueCountFrequency (%)
- 1699
99.5%
6
 
0.4%
1
 
0.1%
1
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 633
85.0%
[ 112
 
15.0%
Close Punctuation
ValueCountFrequency (%)
) 632
85.1%
] 111
 
14.9%
Space Separator
ValueCountFrequency (%)
92912
100.0%
Final Punctuation
ValueCountFrequency (%)
12
100.0%
Initial Punctuation
ValueCountFrequency (%)
12
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 11
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 3
100.0%
Other Symbol
ValueCountFrequency (%)
° 2
100.0%
Other Letter
ValueCountFrequency (%)
º 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 468565
75.9%
Common 148675
 
24.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 51486
 
11.0%
a 37742
 
8.1%
n 34944
 
7.5%
o 33997
 
7.3%
i 32379
 
6.9%
t 32167
 
6.9%
s 26495
 
5.7%
r 25801
 
5.5%
l 22827
 
4.9%
d 18987
 
4.1%
Other values (47) 151740
32.4%
Common
ValueCountFrequency (%)
92912
62.5%
. 7077
 
4.8%
1 5919
 
4.0%
2 4779
 
3.2%
0 3893
 
2.6%
, 3513
 
2.4%
5 3430
 
2.3%
3 3030
 
2.0%
6 3028
 
2.0%
4 2960
 
2.0%
Other values (35) 18134
 
12.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 617187
> 99.9%
Punctuation 32
 
< 0.1%
None 21
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
92912
15.1%
e 51486
 
8.3%
a 37742
 
6.1%
n 34944
 
5.7%
o 33997
 
5.5%
i 32379
 
5.2%
t 32167
 
5.2%
s 26495
 
4.3%
r 25801
 
4.2%
l 22827
 
3.7%
Other values (80) 226437
36.7%
Punctuation
ValueCountFrequency (%)
12
37.5%
12
37.5%
6
18.8%
1
 
3.1%
1
 
3.1%
None
ValueCountFrequency (%)
± 9
42.9%
é 3
 
14.3%
ó 2
 
9.5%
ñ 2
 
9.5%
ç 2
 
9.5%
° 2
 
9.5%
º 1
 
4.8%

eventDate
Text

Missing 

Distinct51161
Distinct (%)9.4%
Missing41361
Missing (%)7.1%
Memory size4.5 MiB
2025-01-08T17:55:02.214728image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length21
Median length10
Mean length9.758292513
Min length4

Characters and Unicode

Total characters5301007
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7962 ?
Unique (%)1.5%

Sample

1st row1859-05
2nd row1883-03-18
3rd row1895-05-26
4th row1924-08-06
5th row1987-04-09
ValueCountFrequency (%)
1865 620
 
0.1%
1877 533
 
0.1%
1966 478
 
0.1%
1926 419
 
0.1%
1939-07 366
 
0.1%
1937 360
 
0.1%
1936 281
 
0.1%
1884 276
 
0.1%
1888 253
 
< 0.1%
1881 250
 
< 0.1%
Other values (51151) 539395
99.3%
2025-01-08T17:55:02.460228image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 1042086
19.7%
1 1031643
19.5%
0 807962
15.2%
9 611293
11.5%
2 400793
 
7.6%
8 308847
 
5.8%
6 249386
 
4.7%
3 225349
 
4.3%
5 223549
 
4.2%
4 216076
 
4.1%
Other values (2) 184023
 
3.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4258556
80.3%
Dash Punctuation 1042086
 
19.7%
Other Punctuation 365
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 1031643
24.2%
0 807962
19.0%
9 611293
14.4%
2 400793
 
9.4%
8 308847
 
7.3%
6 249386
 
5.9%
3 225349
 
5.3%
5 223549
 
5.2%
4 216076
 
5.1%
7 183658
 
4.3%
Dash Punctuation
ValueCountFrequency (%)
- 1042086
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 365
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5301007
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 1042086
19.7%
1 1031643
19.5%
0 807962
15.2%
9 611293
11.5%
2 400793
 
7.6%
8 308847
 
5.8%
6 249386
 
4.7%
3 225349
 
4.3%
5 223549
 
4.2%
4 216076
 
4.1%
Other values (2) 184023
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5301007
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 1042086
19.7%
1 1031643
19.5%
0 807962
15.2%
9 611293
11.5%
2 400793
 
7.6%
8 308847
 
5.8%
6 249386
 
4.7%
3 225349
 
4.3%
5 223549
 
4.2%
4 216076
 
4.1%
Other values (2) 184023
 
3.5%

startDayOfYear
Text

Missing 

Distinct366
Distinct (%)0.1%
Missing74069
Missing (%)12.7%
Memory size4.5 MiB
2025-01-08T17:55:02.658137image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length3
Median length3
Mean length2.717769816
Min length1

Characters and Unicode

Total characters1387484
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row77
2nd row146
3rd row219
4th row99
5th row274
ValueCountFrequency (%)
140 2507
 
0.5%
141 2480
 
0.5%
134 2428
 
0.5%
135 2416
 
0.5%
150 2400
 
0.5%
142 2384
 
0.5%
136 2383
 
0.5%
166 2363
 
0.5%
139 2355
 
0.5%
132 2336
 
0.5%
Other values (356) 486471
95.3%
2025-01-08T17:55:02.905289image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 294060
21.2%
2 227022
16.4%
3 170842
12.3%
5 108195
 
7.8%
4 107779
 
7.8%
6 104873
 
7.6%
7 96252
 
6.9%
8 93293
 
6.7%
9 92796
 
6.7%
0 92372
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1387484
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 294060
21.2%
2 227022
16.4%
3 170842
12.3%
5 108195
 
7.8%
4 107779
 
7.8%
6 104873
 
7.6%
7 96252
 
6.9%
8 93293
 
6.7%
9 92796
 
6.7%
0 92372
 
6.7%

Most occurring scripts

ValueCountFrequency (%)
Common 1387484
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 294060
21.2%
2 227022
16.4%
3 170842
12.3%
5 108195
 
7.8%
4 107779
 
7.8%
6 104873
 
7.6%
7 96252
 
6.9%
8 93293
 
6.7%
9 92796
 
6.7%
0 92372
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1387484
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 294060
21.2%
2 227022
16.4%
3 170842
12.3%
5 108195
 
7.8%
4 107779
 
7.8%
6 104873
 
7.6%
7 96252
 
6.9%
8 93293
 
6.7%
9 92796
 
6.7%
0 92372
 
6.7%

endDayOfYear
Text

Missing 

Distinct366
Distinct (%)0.1%
Missing74069
Missing (%)12.7%
Memory size4.5 MiB
2025-01-08T17:55:03.101724image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length3
Median length3
Mean length2.717808992
Min length1

Characters and Unicode

Total characters1387504
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row77
2nd row146
3rd row219
4th row99
5th row274
ValueCountFrequency (%)
140 2508
 
0.5%
141 2480
 
0.5%
134 2427
 
0.5%
135 2416
 
0.5%
150 2398
 
0.5%
136 2383
 
0.5%
142 2380
 
0.5%
166 2363
 
0.5%
139 2354
 
0.5%
132 2338
 
0.5%
Other values (356) 486476
95.3%
2025-01-08T17:55:03.350279image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 294066
21.2%
2 227032
16.4%
3 170858
12.3%
5 108162
 
7.8%
4 107797
 
7.8%
6 104854
 
7.6%
7 96249
 
6.9%
8 93293
 
6.7%
9 92805
 
6.7%
0 92388
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1387504
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 294066
21.2%
2 227032
16.4%
3 170858
12.3%
5 108162
 
7.8%
4 107797
 
7.8%
6 104854
 
7.6%
7 96249
 
6.9%
8 93293
 
6.7%
9 92805
 
6.7%
0 92388
 
6.7%

Most occurring scripts

ValueCountFrequency (%)
Common 1387504
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 294066
21.2%
2 227032
16.4%
3 170858
12.3%
5 108162
 
7.8%
4 107797
 
7.8%
6 104854
 
7.6%
7 96249
 
6.9%
8 93293
 
6.7%
9 92805
 
6.7%
0 92388
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1387504
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 294066
21.2%
2 227032
16.4%
3 170858
12.3%
5 108162
 
7.8%
4 107797
 
7.8%
6 104854
 
7.6%
7 96249
 
6.9%
8 93293
 
6.7%
9 92805
 
6.7%
0 92388
 
6.7%

year
Text

Missing 

Distinct204
Distinct (%)< 0.1%
Missing41376
Missing (%)7.1%
Memory size4.5 MiB
2025-01-08T17:55:03.532630image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters2172864
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row1859
2nd row1883
3rd row1895
4th row1924
5th row1987
ValueCountFrequency (%)
1965 14461
 
2.7%
1964 13001
 
2.4%
1966 10898
 
2.0%
1912 9421
 
1.7%
1911 8196
 
1.5%
1949 8030
 
1.5%
1923 7871
 
1.4%
1950 6975
 
1.3%
1967 6970
 
1.3%
1892 6943
 
1.3%
Other values (194) 450450
82.9%
2025-01-08T17:55:03.753875image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 642014
29.5%
9 524891
24.2%
8 217754
 
10.0%
6 138209
 
6.4%
0 135629
 
6.2%
2 113128
 
5.2%
4 110538
 
5.1%
5 102840
 
4.7%
3 100862
 
4.6%
7 86999
 
4.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2172864
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 642014
29.5%
9 524891
24.2%
8 217754
 
10.0%
6 138209
 
6.4%
0 135629
 
6.2%
2 113128
 
5.2%
4 110538
 
5.1%
5 102840
 
4.7%
3 100862
 
4.6%
7 86999
 
4.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2172864
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 642014
29.5%
9 524891
24.2%
8 217754
 
10.0%
6 138209
 
6.4%
0 135629
 
6.2%
2 113128
 
5.2%
4 110538
 
5.1%
5 102840
 
4.7%
3 100862
 
4.6%
7 86999
 
4.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2172864
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 642014
29.5%
9 524891
24.2%
8 217754
 
10.0%
6 138209
 
6.4%
0 135629
 
6.2%
2 113128
 
5.2%
4 110538
 
5.1%
5 102840
 
4.7%
3 100862
 
4.6%
7 86999
 
4.0%

month
Text

Missing 

Distinct12
Distinct (%)< 0.1%
Missing53877
Missing (%)9.2%
Memory size4.5 MiB
2025-01-08T17:55:03.811165image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length2
Median length1
Mean length1.178898279
Min length1

Characters and Unicode

Total characters625659
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5
2nd row3
3rd row5
4th row8
5th row4
ValueCountFrequency (%)
5 70341
13.3%
6 61173
11.5%
4 54173
10.2%
3 50525
9.5%
7 46973
8.9%
2 40464
7.6%
8 39913
7.5%
9 37742
7.1%
10 35465
6.7%
1 34467
6.5%
Other values (2) 59479
11.2%
2025-01-08T17:55:03.905343image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 160144
25.6%
5 70341
11.2%
2 69210
11.1%
6 61173
 
9.8%
4 54173
 
8.7%
3 50525
 
8.1%
7 46973
 
7.5%
8 39913
 
6.4%
9 37742
 
6.0%
0 35465
 
5.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 625659
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 160144
25.6%
5 70341
11.2%
2 69210
11.1%
6 61173
 
9.8%
4 54173
 
8.7%
3 50525
 
8.1%
7 46973
 
7.5%
8 39913
 
6.4%
9 37742
 
6.0%
0 35465
 
5.7%

Most occurring scripts

ValueCountFrequency (%)
Common 625659
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 160144
25.6%
5 70341
11.2%
2 69210
11.1%
6 61173
 
9.8%
4 54173
 
8.7%
3 50525
 
8.1%
7 46973
 
7.5%
8 39913
 
6.4%
9 37742
 
6.0%
0 35465
 
5.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 625659
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 160144
25.6%
5 70341
11.2%
2 69210
11.1%
6 61173
 
9.8%
4 54173
 
8.7%
3 50525
 
8.1%
7 46973
 
7.5%
8 39913
 
6.4%
9 37742
 
6.0%
0 35465
 
5.7%

day
Text

Missing 

Distinct31
Distinct (%)< 0.1%
Missing74434
Missing (%)12.7%
Memory size4.5 MiB
2025-01-08T17:55:03.970824image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length2
Median length2
Mean length1.707414174
Min length1

Characters and Unicode

Total characters871051
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row18
2nd row26
3rd row6
4th row9
5th row1
ValueCountFrequency (%)
20 17976
 
3.5%
10 17940
 
3.5%
8 17679
 
3.5%
15 17667
 
3.5%
21 17460
 
3.4%
12 17459
 
3.4%
24 17311
 
3.4%
22 17146
 
3.4%
4 17141
 
3.4%
16 17122
 
3.4%
Other values (21) 335257
65.7%
2025-01-08T17:55:04.094439image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 228583
26.2%
2 217994
25.0%
3 73728
 
8.5%
4 51207
 
5.9%
8 50991
 
5.9%
0 50799
 
5.8%
5 50165
 
5.8%
6 49818
 
5.7%
7 49449
 
5.7%
9 48317
 
5.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 871051
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 228583
26.2%
2 217994
25.0%
3 73728
 
8.5%
4 51207
 
5.9%
8 50991
 
5.9%
0 50799
 
5.8%
5 50165
 
5.8%
6 49818
 
5.7%
7 49449
 
5.7%
9 48317
 
5.5%

Most occurring scripts

ValueCountFrequency (%)
Common 871051
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 228583
26.2%
2 217994
25.0%
3 73728
 
8.5%
4 51207
 
5.9%
8 50991
 
5.9%
0 50799
 
5.8%
5 50165
 
5.8%
6 49818
 
5.7%
7 49449
 
5.7%
9 48317
 
5.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 871051
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 228583
26.2%
2 217994
25.0%
3 73728
 
8.5%
4 51207
 
5.9%
8 50991
 
5.9%
0 50799
 
5.8%
5 50165
 
5.8%
6 49818
 
5.7%
7 49449
 
5.7%
9 48317
 
5.5%

verbatimEventDate
Text

Missing 

Distinct43994
Distinct (%)12.6%
Missing235442
Missing (%)40.3%
Memory size4.5 MiB
2025-01-08T17:55:04.263549image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length60
Median length11
Mean length10.64060719
Min length1

Characters and Unicode

Total characters3715168
Distinct characters77
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10311 ?
Unique (%)3.0%

Sample

1st row-- May 1859
2nd row18 Mar 1883
3rd row26 May 1895
4th row6 Aug 1924
5th row9 Apr 1987
ValueCountFrequency (%)
149965
 
14.3%
may 43235
 
4.1%
jun 37603
 
3.6%
apr 31571
 
3.0%
mar 27292
 
2.6%
jul 27206
 
2.6%
aug 23700
 
2.3%
feb 21866
 
2.1%
sep 21167
 
2.0%
jan 18181
 
1.7%
Other values (727) 644585
61.6%
2025-01-08T17:55:04.498940image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
697221
18.8%
1 503447
13.6%
- 381992
 
10.3%
9 327404
 
8.8%
2 174483
 
4.7%
8 174195
 
4.7%
6 106883
 
2.9%
3 99628
 
2.7%
4 93965
 
2.5%
a 89421
 
2.4%
Other values (67) 1066529
28.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1729072
46.5%
Space Separator 697221
18.8%
Lowercase Letter 605710
 
16.3%
Dash Punctuation 381992
 
10.3%
Uppercase Letter 300576
 
8.1%
Other Punctuation 550
 
< 0.1%
Close Punctuation 16
 
< 0.1%
Open Punctuation 16
 
< 0.1%
Math Symbol 8
 
< 0.1%
Format 4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 89421
14.8%
u 89025
14.7%
r 60311
10.0%
e 56954
9.4%
n 56861
9.4%
p 53345
8.8%
y 43351
7.2%
c 31070
 
5.1%
l 28240
 
4.7%
g 24158
 
4.0%
Other values (14) 72974
12.0%
Uppercase Letter
ValueCountFrequency (%)
J 83206
27.7%
M 70639
23.5%
A 55418
18.4%
F 22373
 
7.4%
S 21945
 
7.3%
O 18184
 
6.0%
N 15173
 
5.0%
D 12829
 
4.3%
W 412
 
0.1%
I 175
 
0.1%
Other values (14) 222
 
0.1%
Decimal Number
ValueCountFrequency (%)
1 503447
29.1%
9 327404
18.9%
2 174483
 
10.1%
8 174195
 
10.1%
6 106883
 
6.2%
3 99628
 
5.8%
4 93965
 
5.4%
0 87653
 
5.1%
5 82545
 
4.8%
7 78869
 
4.6%
Other Punctuation
ValueCountFrequency (%)
/ 176
32.0%
. 144
26.2%
, 89
16.2%
? 49
 
8.9%
' 34
 
6.2%
: 32
 
5.8%
& 12
 
2.2%
\ 11
 
2.0%
" 2
 
0.4%
# 1
 
0.2%
Close Punctuation
ValueCountFrequency (%)
] 8
50.0%
) 8
50.0%
Open Punctuation
ValueCountFrequency (%)
( 8
50.0%
[ 8
50.0%
Space Separator
ValueCountFrequency (%)
697221
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 381992
100.0%
Math Symbol
ValueCountFrequency (%)
= 8
100.0%
Format
ValueCountFrequency (%)
4
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2808882
75.6%
Latin 906286
 
24.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 89421
 
9.9%
u 89025
 
9.8%
J 83206
 
9.2%
M 70639
 
7.8%
r 60311
 
6.7%
e 56954
 
6.3%
n 56861
 
6.3%
A 55418
 
6.1%
p 53345
 
5.9%
y 43351
 
4.8%
Other values (38) 247755
27.3%
Common
ValueCountFrequency (%)
697221
24.8%
1 503447
17.9%
- 381992
13.6%
9 327404
11.7%
2 174483
 
6.2%
8 174195
 
6.2%
6 106883
 
3.8%
3 99628
 
3.5%
4 93965
 
3.3%
0 87653
 
3.1%
Other values (19) 162011
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3715164
> 99.9%
Punctuation 4
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
697221
18.8%
1 503447
13.6%
- 381992
 
10.3%
9 327404
 
8.8%
2 174483
 
4.7%
8 174195
 
4.7%
6 106883
 
2.9%
3 99628
 
2.7%
4 93965
 
2.5%
a 89421
 
2.4%
Other values (66) 1066525
28.7%
Punctuation
ValueCountFrequency (%)
4
100.0%

habitat
Text

Missing 

Distinct4924
Distinct (%)28.6%
Missing567355
Missing (%)97.1%
Memory size4.5 MiB
2025-01-08T17:55:04.683804image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length191
Median length141
Mean length27.13418808
Min length3

Characters and Unicode

Total characters467712
Distinct characters82
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3478 ?
Unique (%)20.2%

Sample

1st rowIN OPEN OCEAN AT 0835
2nd rowdense marshy grass
3rd rowAlong lake shore, water and dead brush
4th rowairport
5th rowmontane forest edge
ValueCountFrequency (%)
forest 6854
 
9.3%
with 2343
 
3.2%
open 1915
 
2.6%
of 1628
 
2.2%
in 1549
 
2.1%
and 1461
 
2.0%
scrub 1279
 
1.7%
edge 1213
 
1.6%
945
 
1.3%
on 919
 
1.2%
Other values (2526) 53491
72.7%
2025-01-08T17:55:04.933257image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
56360
 
12.1%
e 41846
 
8.9%
o 33590
 
7.2%
a 33285
 
7.1%
s 31715
 
6.8%
r 31696
 
6.8%
t 25427
 
5.4%
n 24905
 
5.3%
i 21550
 
4.6%
l 17791
 
3.8%
Other values (72) 149547
32.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 376043
80.4%
Space Separator 56360
 
12.1%
Uppercase Letter 25209
 
5.4%
Other Punctuation 6419
 
1.4%
Dash Punctuation 1495
 
0.3%
Decimal Number 1436
 
0.3%
Open Punctuation 365
 
0.1%
Close Punctuation 365
 
0.1%
Math Symbol 20
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 41846
11.1%
o 33590
 
8.9%
a 33285
 
8.9%
s 31715
 
8.4%
r 31696
 
8.4%
t 25427
 
6.8%
n 24905
 
6.6%
i 21550
 
5.7%
l 17791
 
4.7%
d 17630
 
4.7%
Other values (16) 96608
25.7%
Uppercase Letter
ValueCountFrequency (%)
O 2706
 
10.7%
E 2400
 
9.5%
R 2179
 
8.6%
A 2016
 
8.0%
N 1705
 
6.8%
S 1663
 
6.6%
L 1499
 
5.9%
I 1493
 
5.9%
T 1420
 
5.6%
C 1283
 
5.1%
Other values (16) 6845
27.2%
Other Punctuation
ValueCountFrequency (%)
, 4306
67.1%
. 571
 
8.9%
; 545
 
8.5%
& 456
 
7.1%
/ 439
 
6.8%
" 28
 
0.4%
: 27
 
0.4%
' 25
 
0.4%
? 15
 
0.2%
# 4
 
0.1%
Decimal Number
ValueCountFrequency (%)
0 592
41.2%
5 303
21.1%
1 155
 
10.8%
2 135
 
9.4%
3 126
 
8.8%
4 53
 
3.7%
6 27
 
1.9%
8 17
 
1.2%
7 17
 
1.2%
9 11
 
0.8%
Math Symbol
ValueCountFrequency (%)
+ 8
40.0%
< 7
35.0%
= 5
25.0%
Open Punctuation
ValueCountFrequency (%)
( 350
95.9%
[ 15
 
4.1%
Close Punctuation
ValueCountFrequency (%)
) 350
95.9%
] 15
 
4.1%
Space Separator
ValueCountFrequency (%)
56360
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1495
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 401252
85.8%
Common 66460
 
14.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 41846
 
10.4%
o 33590
 
8.4%
a 33285
 
8.3%
s 31715
 
7.9%
r 31696
 
7.9%
t 25427
 
6.3%
n 24905
 
6.2%
i 21550
 
5.4%
l 17791
 
4.4%
d 17630
 
4.4%
Other values (42) 121817
30.4%
Common
ValueCountFrequency (%)
56360
84.8%
, 4306
 
6.5%
- 1495
 
2.2%
0 592
 
0.9%
. 571
 
0.9%
; 545
 
0.8%
& 456
 
0.7%
/ 439
 
0.7%
( 350
 
0.5%
) 350
 
0.5%
Other values (20) 996
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 467712
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
56360
 
12.1%
e 41846
 
8.9%
o 33590
 
7.2%
a 33285
 
7.1%
s 31715
 
6.8%
r 31696
 
6.8%
t 25427
 
5.4%
n 24905
 
5.3%
i 21550
 
4.6%
l 17791
 
3.8%
Other values (72) 149547
32.0%
Distinct6798
Distinct (%)1.2%
Missing2
Missing (%)< 0.1%
Memory size4.5 MiB
2025-01-08T17:55:05.112322image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length95
Median length75
Mean length36.76763373
Min length4

Characters and Unicode

Total characters21493991
Distinct characters74
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1458 ?
Unique (%)0.2%

Sample

1st rowSouth America, Paraguay, Asuncion
2nd rowNorth America, United States, Florida
3rd rowNorth America, United States, South Dakota
4th rowNorth America, United States, Maine
5th rowAsia, Philippines, Palawan, Palawan Province
ValueCountFrequency (%)
america 389870
 
13.5%
north 349097
 
12.1%
united 213165
 
7.4%
states 211488
 
7.4%
asia 94981
 
3.3%
south 88499
 
3.1%
africa 52986
 
1.8%
mexico 32547
 
1.1%
panama 31800
 
1.1%
colombia 28517
 
1.0%
Other values (4797) 1384325
48.1%
2025-01-08T17:55:05.349198image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2292685
 
10.7%
a 2269264
 
10.6%
i 1576409
 
7.3%
e 1449846
 
6.7%
t 1415514
 
6.6%
r 1302972
 
6.1%
, 1293349
 
6.0%
o 1083406
 
5.0%
n 1034429
 
4.8%
s 708939
 
3.3%
Other values (64) 7067178
32.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 14996132
69.8%
Uppercase Letter 2873711
 
13.4%
Space Separator 2292685
 
10.7%
Other Punctuation 1312996
 
6.1%
Dash Punctuation 16199
 
0.1%
Open Punctuation 1132
 
< 0.1%
Close Punctuation 1131
 
< 0.1%
Decimal Number 3
 
< 0.1%
Math Symbol 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 2269264
15.1%
i 1576409
10.5%
e 1449846
9.7%
t 1415514
9.4%
r 1302972
8.7%
o 1083406
 
7.2%
n 1034429
 
6.9%
s 708939
 
4.7%
c 702577
 
4.7%
h 651096
 
4.3%
Other values (19) 2801680
18.7%
Uppercase Letter
ValueCountFrequency (%)
A 655752
22.8%
N 422462
14.7%
S 371915
12.9%
U 235640
 
8.2%
C 213796
 
7.4%
M 131471
 
4.6%
P 129994
 
4.5%
I 77028
 
2.7%
B 69307
 
2.4%
T 68706
 
2.4%
Other values (16) 497640
17.3%
Other Punctuation
ValueCountFrequency (%)
, 1293349
98.5%
' 6575
 
0.5%
. 5181
 
0.4%
? 4098
 
0.3%
/ 3790
 
0.3%
& 2
 
< 0.1%
\ 1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
2 1
33.3%
8 1
33.3%
6 1
33.3%
Dash Punctuation
ValueCountFrequency (%)
- 16138
99.6%
61
 
0.4%
Open Punctuation
ValueCountFrequency (%)
( 1125
99.4%
[ 7
 
0.6%
Close Punctuation
ValueCountFrequency (%)
) 1124
99.4%
] 7
 
0.6%
Math Symbol
ValueCountFrequency (%)
+ 1
50.0%
~ 1
50.0%
Space Separator
ValueCountFrequency (%)
2292685
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 17869843
83.1%
Common 3624148
 
16.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 2269264
12.7%
i 1576409
 
8.8%
e 1449846
 
8.1%
t 1415514
 
7.9%
r 1302972
 
7.3%
o 1083406
 
6.1%
n 1034429
 
5.8%
s 708939
 
4.0%
c 702577
 
3.9%
A 655752
 
3.7%
Other values (45) 5670735
31.7%
Common
ValueCountFrequency (%)
2292685
63.3%
, 1293349
35.7%
- 16138
 
0.4%
' 6575
 
0.2%
. 5181
 
0.1%
? 4098
 
0.1%
/ 3790
 
0.1%
( 1125
 
< 0.1%
) 1124
 
< 0.1%
61
 
< 0.1%
Other values (9) 22
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21493923
> 99.9%
Punctuation 61
 
< 0.1%
None 7
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2292685
 
10.7%
a 2269264
 
10.6%
i 1576409
 
7.3%
e 1449846
 
6.7%
t 1415514
 
6.6%
r 1302972
 
6.1%
, 1293349
 
6.0%
o 1083406
 
5.0%
n 1034429
 
4.8%
s 708939
 
3.3%
Other values (60) 7067110
32.9%
Punctuation
ValueCountFrequency (%)
61
100.0%
None
ValueCountFrequency (%)
ô 4
57.1%
é 2
28.6%
ä 1
 
14.3%

continent
Text

Missing 

Distinct7
Distinct (%)< 0.1%
Missing27500
Missing (%)4.7%
Memory size4.5 MiB
2025-01-08T17:55:05.407198image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length13
Median length13
Mean length10.59729093
Min length4

Characters and Unicode

Total characters5903666
Distinct characters15
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSOUTH_AMERICA
2nd rowNORTH_AMERICA
3rd rowNORTH_AMERICA
4th rowNORTH_AMERICA
5th rowASIA
ValueCountFrequency (%)
north_america 322157
57.8%
asia 96833
 
17.4%
south_america 69099
 
12.4%
africa 47406
 
8.5%
oceania 11848
 
2.1%
europe 8714
 
1.6%
antarctica 1035
 
0.2%
2025-01-08T17:55:05.502948image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
A 1097791
18.6%
R 770568
13.1%
I 548378
9.3%
C 452580
7.7%
E 420532
 
7.1%
O 411818
 
7.0%
T 393326
 
6.7%
H 391256
 
6.6%
_ 391256
 
6.6%
M 391256
 
6.6%
Other values (5) 634905
10.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 5512410
93.4%
Connector Punctuation 391256
 
6.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 1097791
19.9%
R 770568
14.0%
I 548378
9.9%
C 452580
8.2%
E 420532
 
7.6%
O 411818
 
7.5%
T 393326
 
7.1%
H 391256
 
7.1%
M 391256
 
7.1%
N 335040
 
6.1%
Other values (4) 299865
 
5.4%
Connector Punctuation
ValueCountFrequency (%)
_ 391256
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5512410
93.4%
Common 391256
 
6.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 1097791
19.9%
R 770568
14.0%
I 548378
9.9%
C 452580
8.2%
E 420532
 
7.6%
O 411818
 
7.5%
T 393326
 
7.1%
H 391256
 
7.1%
M 391256
 
7.1%
N 335040
 
6.1%
Other values (4) 299865
 
5.4%
Common
ValueCountFrequency (%)
_ 391256
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5903666
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 1097791
18.6%
R 770568
13.1%
I 548378
9.3%
C 452580
7.7%
E 420532
 
7.1%
O 411818
 
7.0%
T 393326
 
6.7%
H 391256
 
6.6%
_ 391256
 
6.6%
M 391256
 
6.6%
Other values (5) 634905
10.8%

waterBody
Text

Missing 

Distinct67
Distinct (%)0.3%
Missing558515
Missing (%)95.5%
Memory size4.5 MiB
2025-01-08T17:55:05.565769image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length55
Median length19
Mean length20.14311462
Min length8

Characters and Unicode

Total characters525272
Distinct characters45
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)0.1%

Sample

1st rowArctic Ocean
2nd rowNorth Pacific Ocean
3rd rowNorth Pacific Ocean
4th rowNorth Pacific Ocean
5th rowNorth Pacific Ocean
ValueCountFrequency (%)
ocean 26055
32.3%
pacific 19043
23.6%
north 16048
19.9%
south 6719
 
8.3%
atlantic 4113
 
5.1%
indian 2690
 
3.3%
sea 2523
 
3.1%
mediterranean 1992
 
2.5%
weddell 131
 
0.2%
arctic 125
 
0.2%
Other values (57) 1126
 
1.4%
2025-01-08T17:55:05.683756image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
c 68650
13.1%
a 59282
11.3%
54488
10.4%
i 47442
9.0%
n 40099
 
7.6%
e 35322
 
6.7%
t 33362
 
6.4%
O 26120
 
5.0%
o 23090
 
4.4%
h 23023
 
4.4%
Other values (35) 114394
21.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 387593
73.8%
Uppercase Letter 80498
 
15.3%
Space Separator 54488
 
10.4%
Other Punctuation 2693
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
c 68650
17.7%
a 59282
15.3%
i 47442
12.2%
n 40099
10.3%
e 35322
9.1%
t 33362
8.6%
o 23090
 
6.0%
h 23023
 
5.9%
r 20677
 
5.3%
f 19227
 
5.0%
Other values (14) 17419
 
4.5%
Uppercase Letter
ValueCountFrequency (%)
O 26120
32.4%
P 19099
23.7%
N 16052
19.9%
S 9360
 
11.6%
A 4242
 
5.3%
I 2690
 
3.3%
M 1995
 
2.5%
B 240
 
0.3%
C 217
 
0.3%
W 158
 
0.2%
Other values (8) 325
 
0.4%
Other Punctuation
ValueCountFrequency (%)
, 2671
99.2%
? 22
 
0.8%
Space Separator
ValueCountFrequency (%)
54488
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 468091
89.1%
Common 57181
 
10.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
c 68650
14.7%
a 59282
12.7%
i 47442
10.1%
n 40099
8.6%
e 35322
 
7.5%
t 33362
 
7.1%
O 26120
 
5.6%
o 23090
 
4.9%
h 23023
 
4.9%
r 20677
 
4.4%
Other values (32) 91024
19.4%
Common
ValueCountFrequency (%)
54488
95.3%
, 2671
 
4.7%
? 22
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 525272
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
c 68650
13.1%
a 59282
11.3%
54488
10.4%
i 47442
9.0%
n 40099
 
7.6%
e 35322
 
6.7%
t 33362
 
6.4%
O 26120
 
5.0%
o 23090
 
4.4%
h 23023
 
4.4%
Other values (35) 114394
21.8%
Distinct216
Distinct (%)< 0.1%
Missing3736
Missing (%)0.6%
Memory size4.5 MiB
2025-01-08T17:55:05.836542image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters1161712
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)< 0.1%

Sample

1st rowPY
2nd rowUS
3rd rowUS
4th rowUS
5th rowPH
ValueCountFrequency (%)
us 216836
37.3%
co 28553
 
4.9%
mx 28229
 
4.9%
pa 27171
 
4.7%
ca 17452
 
3.0%
th 17424
 
3.0%
ph 16446
 
2.8%
zz 16268
 
2.8%
cn 14054
 
2.4%
id 13339
 
2.3%
Other values (206) 185084
31.9%
2025-01-08T17:55:06.041481image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
U 233998
20.1%
S 224421
19.3%
C 85053
 
7.3%
A 67721
 
5.8%
P 59989
 
5.2%
M 46638
 
4.0%
Z 46463
 
4.0%
T 40681
 
3.5%
E 39938
 
3.4%
H 39416
 
3.4%
Other values (16) 277394
23.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1161712
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
U 233998
20.1%
S 224421
19.3%
C 85053
 
7.3%
A 67721
 
5.8%
P 59989
 
5.2%
M 46638
 
4.0%
Z 46463
 
4.0%
T 40681
 
3.5%
E 39938
 
3.4%
H 39416
 
3.4%
Other values (16) 277394
23.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 1161712
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
U 233998
20.1%
S 224421
19.3%
C 85053
 
7.3%
A 67721
 
5.8%
P 59989
 
5.2%
M 46638
 
4.0%
Z 46463
 
4.0%
T 40681
 
3.5%
E 39938
 
3.4%
H 39416
 
3.4%
Other values (16) 277394
23.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1161712
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
U 233998
20.1%
S 224421
19.3%
C 85053
 
7.3%
A 67721
 
5.8%
P 59989
 
5.2%
M 46638
 
4.0%
Z 46463
 
4.0%
T 40681
 
3.5%
E 39938
 
3.4%
H 39416
 
3.4%
Other values (16) 277394
23.9%

stateProvince
Text

Missing 

Distinct2242
Distinct (%)0.5%
Missing93871
Missing (%)16.1%
Memory size4.5 MiB
2025-01-08T17:55:06.203261image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length71
Median length40
Mean length9.131608388
Min length3

Characters and Unicode

Total characters4481072
Distinct characters67
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique420 ?
Unique (%)0.1%

Sample

1st rowAsuncion
2nd rowFlorida
3rd rowSouth Dakota
4th rowMaine
5th rowPalawan
ValueCountFrequency (%)
california 23409
 
3.6%
new 20454
 
3.1%
alaska 19385
 
3.0%
virginia 14953
 
2.3%
arizona 13147
 
2.0%
maryland 10719
 
1.6%
florida 10644
 
1.6%
texas 9775
 
1.5%
columbia 9291
 
1.4%
island 9097
 
1.4%
Other values (2044) 512747
78.4%
2025-01-08T17:55:06.437987image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 688102
15.4%
i 363250
 
8.1%
n 330347
 
7.4%
o 310192
 
6.9%
r 284632
 
6.4%
e 240206
 
5.4%
l 198665
 
4.4%
s 197499
 
4.4%
162900
 
3.6%
t 158835
 
3.5%
Other values (57) 1546444
34.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3642328
81.3%
Uppercase Letter 655428
 
14.6%
Space Separator 162900
 
3.6%
Dash Punctuation 12832
 
0.3%
Other Punctuation 7148
 
0.2%
Open Punctuation 216
 
< 0.1%
Close Punctuation 216
 
< 0.1%
Decimal Number 3
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 688102
18.9%
i 363250
10.0%
n 330347
9.1%
o 310192
8.5%
r 284632
 
7.8%
e 240206
 
6.6%
l 198665
 
5.5%
s 197499
 
5.4%
t 158835
 
4.4%
u 137454
 
3.8%
Other values (18) 733146
20.1%
Uppercase Letter
ValueCountFrequency (%)
C 87330
13.3%
M 61534
 
9.4%
A 60288
 
9.2%
N 58797
 
9.0%
S 40319
 
6.2%
T 35065
 
5.3%
I 30921
 
4.7%
P 30209
 
4.6%
D 27952
 
4.3%
B 25816
 
3.9%
Other values (16) 197197
30.1%
Other Punctuation
ValueCountFrequency (%)
' 3008
42.1%
? 1713
24.0%
/ 1358
19.0%
. 901
 
12.6%
, 168
 
2.4%
Decimal Number
ValueCountFrequency (%)
2 1
33.3%
8 1
33.3%
6 1
33.3%
Space Separator
ValueCountFrequency (%)
162900
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 12832
100.0%
Open Punctuation
ValueCountFrequency (%)
( 216
100.0%
Close Punctuation
ValueCountFrequency (%)
) 216
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4297756
95.9%
Common 183316
 
4.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 688102
16.0%
i 363250
 
8.5%
n 330347
 
7.7%
o 310192
 
7.2%
r 284632
 
6.6%
e 240206
 
5.6%
l 198665
 
4.6%
s 197499
 
4.6%
t 158835
 
3.7%
u 137454
 
3.2%
Other values (44) 1388574
32.3%
Common
ValueCountFrequency (%)
162900
88.9%
- 12832
 
7.0%
' 3008
 
1.6%
? 1713
 
0.9%
/ 1358
 
0.7%
. 901
 
0.5%
( 216
 
0.1%
) 216
 
0.1%
, 168
 
0.1%
+ 1
 
< 0.1%
Other values (3) 3
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4481070
> 99.9%
None 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 688102
15.4%
i 363250
 
8.1%
n 330347
 
7.4%
o 310192
 
6.9%
r 284632
 
6.4%
e 240206
 
5.4%
l 198665
 
4.4%
s 197499
 
4.4%
162900
 
3.6%
t 158835
 
3.5%
Other values (55) 1546442
34.5%
None
ValueCountFrequency (%)
ô 1
50.0%
é 1
50.0%

county
Text

Missing 

Distinct3216
Distinct (%)1.4%
Missing353572
Missing (%)60.5%
Memory size4.5 MiB
2025-01-08T17:55:06.621206image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length39
Median length31
Mean length9.707878106
Min length1

Characters and Unicode

Total characters2242714
Distinct characters69
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique641 ?
Unique (%)0.3%

Sample

1st rowPalawan Province
2nd rowBergen
3rd rowNorth Solomons Province
4th rowClarke
5th rowAugusta
ValueCountFrequency (%)
area 7116
 
2.1%
census 7108
 
2.1%
province 5993
 
1.8%
bergen 4929
 
1.5%
aleutians 4466
 
1.3%
county 4430
 
1.3%
west 4293
 
1.3%
borough 3777
 
1.1%
san 3628
 
1.1%
latah 3591
 
1.1%
Other values (2933) 289412
85.4%
2025-01-08T17:55:06.866477image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 244424
 
10.9%
e 199313
 
8.9%
n 165552
 
7.4%
o 159648
 
7.1%
r 146489
 
6.5%
i 116092
 
5.2%
107723
 
4.8%
t 103825
 
4.6%
s 98320
 
4.4%
l 98134
 
4.4%
Other values (59) 803194
35.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1783043
79.5%
Uppercase Letter 339922
 
15.2%
Space Separator 107723
 
4.8%
Other Punctuation 7691
 
0.3%
Dash Punctuation 3359
 
0.1%
Open Punctuation 488
 
< 0.1%
Close Punctuation 487
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 244424
13.7%
e 199313
11.2%
n 165552
9.3%
o 159648
9.0%
r 146489
 
8.2%
i 116092
 
6.5%
t 103825
 
5.8%
s 98320
 
5.5%
l 98134
 
5.5%
u 80396
 
4.5%
Other values (19) 370850
20.8%
Uppercase Letter
ValueCountFrequency (%)
C 46264
13.6%
S 28692
 
8.4%
A 28447
 
8.4%
M 26585
 
7.8%
B 26532
 
7.8%
P 24905
 
7.3%
D 17542
 
5.2%
L 16783
 
4.9%
N 15007
 
4.4%
H 14626
 
4.3%
Other values (16) 94539
27.8%
Other Punctuation
ValueCountFrequency (%)
' 3565
46.4%
/ 2007
26.1%
. 1654
21.5%
? 462
 
6.0%
& 2
 
< 0.1%
, 1
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 3298
98.2%
61
 
1.8%
Open Punctuation
ValueCountFrequency (%)
( 481
98.6%
[ 7
 
1.4%
Close Punctuation
ValueCountFrequency (%)
) 480
98.6%
] 7
 
1.4%
Space Separator
ValueCountFrequency (%)
107723
100.0%
Math Symbol
ValueCountFrequency (%)
~ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2122965
94.7%
Common 119749
 
5.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 244424
 
11.5%
e 199313
 
9.4%
n 165552
 
7.8%
o 159648
 
7.5%
r 146489
 
6.9%
i 116092
 
5.5%
t 103825
 
4.9%
s 98320
 
4.6%
l 98134
 
4.6%
u 80396
 
3.8%
Other values (45) 710772
33.5%
Common
ValueCountFrequency (%)
107723
90.0%
' 3565
 
3.0%
- 3298
 
2.8%
/ 2007
 
1.7%
. 1654
 
1.4%
( 481
 
0.4%
) 480
 
0.4%
? 462
 
0.4%
61
 
0.1%
[ 7
 
< 0.1%
Other values (4) 11
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2242650
> 99.9%
Punctuation 61
 
< 0.1%
None 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 244424
 
10.9%
e 199313
 
8.9%
n 165552
 
7.4%
o 159648
 
7.1%
r 146489
 
6.5%
i 116092
 
5.2%
107723
 
4.8%
t 103825
 
4.6%
s 98320
 
4.4%
l 98134
 
4.4%
Other values (55) 803130
35.8%
Punctuation
ValueCountFrequency (%)
61
100.0%
None
ValueCountFrequency (%)
ô 1
33.3%
é 1
33.3%
ä 1
33.3%

locality
Text

Missing 

Distinct64255
Distinct (%)13.5%
Missing107551
Missing (%)18.4%
Memory size4.5 MiB
2025-01-08T17:55:07.051744image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length929
Median length128
Mean length17.88850853
Min length1

Characters and Unicode

Total characters8533552
Distinct characters112
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33921 ?
Unique (%)7.1%

Sample

1st rowAsuncion
2nd rowBryant, Near
3rd rowOwl'S Head
4th rowNali Barrio, Dam Site, Quezon Municipality
5th rowFort Lee
ValueCountFrequency (%)
island 33520
 
2.4%
mi 31811
 
2.3%
of 23110
 
1.6%
river 22675
 
1.6%
rio 21864
 
1.6%
km 18525
 
1.3%
fort 14257
 
1.0%
san 13196
 
0.9%
near 13030
 
0.9%
lake 11919
 
0.8%
Other values (33466) 1203009
85.5%
2025-01-08T17:55:07.299224image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
929876
 
10.9%
a 913886
 
10.7%
e 542145
 
6.4%
o 539938
 
6.3%
n 524605
 
6.1%
i 502610
 
5.9%
r 415688
 
4.9%
l 354250
 
4.2%
t 336164
 
3.9%
s 280759
 
3.3%
Other values (102) 3193631
37.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5940218
69.6%
Uppercase Letter 1267138
 
14.8%
Space Separator 929876
 
10.9%
Other Punctuation 265001
 
3.1%
Decimal Number 105220
 
1.2%
Dash Punctuation 11355
 
0.1%
Open Punctuation 5161
 
0.1%
Close Punctuation 5158
 
0.1%
Math Symbol 4354
 
0.1%
Connector Punctuation 44
 
< 0.1%
Other values (2) 27
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 913886
15.4%
e 542145
9.1%
o 539938
9.1%
n 524605
8.8%
i 502610
 
8.5%
r 415688
 
7.0%
l 354250
 
6.0%
t 336164
 
5.7%
s 280759
 
4.7%
u 248246
 
4.2%
Other values (36) 1281927
21.6%
Uppercase Letter
ValueCountFrequency (%)
S 138099
 
10.9%
C 105764
 
8.3%
M 93638
 
7.4%
B 86701
 
6.8%
P 86223
 
6.8%
R 83609
 
6.6%
L 71717
 
5.7%
N 65417
 
5.2%
I 56763
 
4.5%
A 52124
 
4.1%
Other values (17) 427083
33.7%
Other Punctuation
ValueCountFrequency (%)
, 231908
87.5%
. 22167
 
8.4%
' 6055
 
2.3%
? 1337
 
0.5%
/ 958
 
0.4%
" 816
 
0.3%
: 659
 
0.2%
# 444
 
0.2%
& 364
 
0.1%
; 283
 
0.1%
Other values (3) 10
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 23097
22.0%
5 17642
16.8%
2 14487
13.8%
0 13388
12.7%
3 8220
 
7.8%
4 6775
 
6.4%
8 6256
 
5.9%
7 5786
 
5.5%
6 5236
 
5.0%
9 4333
 
4.1%
Math Symbol
ValueCountFrequency (%)
= 4291
98.6%
+ 56
 
1.3%
~ 7
 
0.2%
Open Punctuation
ValueCountFrequency (%)
( 3168
61.4%
[ 1992
38.6%
{ 1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 3167
61.4%
] 1990
38.6%
} 1
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 11354
> 99.9%
1
 
< 0.1%
Final Punctuation
ValueCountFrequency (%)
12
80.0%
3
 
20.0%
Space Separator
ValueCountFrequency (%)
929876
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 44
100.0%
Initial Punctuation
ValueCountFrequency (%)
12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 7207356
84.5%
Common 1326196
 
15.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 913886
 
12.7%
e 542145
 
7.5%
o 539938
 
7.5%
n 524605
 
7.3%
i 502610
 
7.0%
r 415688
 
5.8%
l 354250
 
4.9%
t 336164
 
4.7%
s 280759
 
3.9%
u 248246
 
3.4%
Other values (63) 2549065
35.4%
Common
ValueCountFrequency (%)
929876
70.1%
, 231908
 
17.5%
1 23097
 
1.7%
. 22167
 
1.7%
5 17642
 
1.3%
2 14487
 
1.1%
0 13388
 
1.0%
- 11354
 
0.9%
3 8220
 
0.6%
4 6775
 
0.5%
Other values (29) 47282
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8533180
> 99.9%
None 344
 
< 0.1%
Punctuation 28
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
929876
 
10.9%
a 913886
 
10.7%
e 542145
 
6.4%
o 539938
 
6.3%
n 524605
 
6.1%
i 502610
 
5.9%
r 415688
 
4.9%
l 354250
 
4.2%
t 336164
 
3.9%
s 280759
 
3.3%
Other values (77) 3193259
37.4%
None
ValueCountFrequency (%)
ñ 80
23.3%
ô 69
20.1%
á 58
16.9%
í 35
10.2%
ā 21
 
6.1%
é 17
 
4.9%
ã 13
 
3.8%
è 10
 
2.9%
ú 9
 
2.6%
ö 8
 
2.3%
Other values (11) 24
 
7.0%
Punctuation
ValueCountFrequency (%)
12
42.9%
12
42.9%
3
 
10.7%
1
 
3.6%

verbatimElevation
Text

Missing 

Distinct196
Distinct (%)15.4%
Missing583323
Missing (%)99.8%
Memory size4.5 MiB
2025-01-08T17:55:07.401001image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length84
Median length9
Mean length13.72813239
Min length3

Characters and Unicode

Total characters17421
Distinct characters55
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique108 ?
Unique (%)8.5%

Sample

1st rowaltitude uncertain: label says both 5500 ft and 7000 ft
2nd rowca. 1050 m
3rd rowca. 4000 ft
4th rowsea level
5th row6230 ft
ValueCountFrequency (%)
sea 769
20.9%
level 769
20.9%
ft 409
11.1%
ca 177
 
4.8%
m 115
 
3.1%
says 114
 
3.1%
label 100
 
2.7%
altitude 92
 
2.5%
uncertain 74
 
2.0%
of 67
 
1.8%
Other values (170) 986
26.9%
2025-01-08T17:55:07.569688image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 2820
16.2%
2403
13.8%
l 1955
11.2%
a 1546
8.9%
0 1357
 
7.8%
s 1076
 
6.2%
t 881
 
5.1%
v 812
 
4.7%
f 520
 
3.0%
n 353
 
2.0%
Other values (45) 3698
21.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 12005
68.9%
Decimal Number 2407
 
13.8%
Space Separator 2403
 
13.8%
Other Punctuation 415
 
2.4%
Math Symbol 88
 
0.5%
Dash Punctuation 72
 
0.4%
Uppercase Letter 27
 
0.2%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 2820
23.5%
l 1955
16.3%
a 1546
12.9%
s 1076
 
9.0%
t 881
 
7.3%
v 812
 
6.8%
f 520
 
4.3%
n 353
 
2.9%
c 298
 
2.5%
i 282
 
2.3%
Other values (14) 1462
12.2%
Decimal Number
ValueCountFrequency (%)
0 1357
56.4%
1 245
 
10.2%
5 213
 
8.8%
6 133
 
5.5%
2 119
 
4.9%
3 102
 
4.2%
8 92
 
3.8%
9 57
 
2.4%
4 54
 
2.2%
7 35
 
1.5%
Uppercase Letter
ValueCountFrequency (%)
S 9
33.3%
L 8
29.6%
E 4
14.8%
A 2
 
7.4%
O 1
 
3.7%
C 1
 
3.7%
I 1
 
3.7%
B 1
 
3.7%
Other Punctuation
ValueCountFrequency (%)
. 236
56.9%
: 99
23.9%
, 55
 
13.3%
? 17
 
4.1%
" 4
 
1.0%
; 4
 
1.0%
Math Symbol
ValueCountFrequency (%)
< 34
38.6%
> 33
37.5%
+ 21
23.9%
Space Separator
ValueCountFrequency (%)
2403
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 72
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12032
69.1%
Common 5389
30.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 2820
23.4%
l 1955
16.2%
a 1546
12.8%
s 1076
 
8.9%
t 881
 
7.3%
v 812
 
6.7%
f 520
 
4.3%
n 353
 
2.9%
c 298
 
2.5%
i 282
 
2.3%
Other values (22) 1489
12.4%
Common
ValueCountFrequency (%)
2403
44.6%
0 1357
25.2%
1 245
 
4.5%
. 236
 
4.4%
5 213
 
4.0%
6 133
 
2.5%
2 119
 
2.2%
3 102
 
1.9%
: 99
 
1.8%
8 92
 
1.7%
Other values (13) 390
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 17421
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 2820
16.2%
2403
13.8%
l 1955
11.2%
a 1546
8.9%
0 1357
 
7.8%
s 1076
 
6.2%
t 881
 
5.1%
v 812
 
4.7%
f 520
 
3.0%
n 353
 
2.0%
Other values (45) 3698
21.2%

decimalLatitude
Text

Missing 

Distinct3290
Distinct (%)11.7%
Missing556566
Missing (%)95.2%
Memory size4.5 MiB
2025-01-08T17:55:07.740525image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length9
Median length8
Mean length5.238100335
Min length3

Characters and Unicode

Total characters146803
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1421 ?
Unique (%)5.1%

Sample

1st row38.4236
2nd row5.85
3rd row7.97
4th row10.52
5th row0.35
ValueCountFrequency (%)
34.9606 991
 
3.5%
31.5011 663
 
2.4%
9.03 592
 
2.1%
8.25 507
 
1.8%
6.45 506
 
1.8%
29.3467 473
 
1.7%
3.65 448
 
1.6%
6.17 374
 
1.3%
12.63 310
 
1.1%
68.13 307
 
1.1%
Other values (3004) 22855
81.5%
2025-01-08T17:55:08.074175image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 28026
19.1%
3 14891
10.1%
1 14405
9.8%
5 12119
8.3%
6 11694
8.0%
8 11032
 
7.5%
4 10580
 
7.2%
7 10374
 
7.1%
2 9852
 
6.7%
0 9609
 
6.5%
Other values (2) 14221
9.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 112692
76.8%
Other Punctuation 28026
 
19.1%
Dash Punctuation 6085
 
4.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 14891
13.2%
1 14405
12.8%
5 12119
10.8%
6 11694
10.4%
8 11032
9.8%
4 10580
9.4%
7 10374
9.2%
2 9852
8.7%
0 9609
8.5%
9 8136
7.2%
Other Punctuation
ValueCountFrequency (%)
. 28026
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6085
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 146803
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 28026
19.1%
3 14891
10.1%
1 14405
9.8%
5 12119
8.3%
6 11694
8.0%
8 11032
 
7.5%
4 10580
 
7.2%
7 10374
 
7.1%
2 9852
 
6.7%
0 9609
 
6.5%
Other values (2) 14221
9.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 146803
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 28026
19.1%
3 14891
10.1%
1 14405
9.8%
5 12119
8.3%
6 11694
8.0%
8 11032
 
7.5%
4 10580
 
7.2%
7 10374
 
7.1%
2 9852
 
6.7%
0 9609
 
6.5%
Other values (2) 14221
9.7%

decimalLongitude
Text

Missing 

Distinct3651
Distinct (%)13.0%
Missing556566
Missing (%)95.2%
Memory size4.5 MiB
2025-01-08T17:55:08.262192image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length11
Median length10
Mean length6.171804753
Min length3

Characters and Unicode

Total characters172971
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1668 ?
Unique (%)6.0%

Sample

1st row-79.2803
2nd row116.08
3rd row-73.95
4th row-75.02
5th row-176.53
ValueCountFrequency (%)
69.2778 991
 
3.5%
65.8453 663
 
2.4%
36.15 546
 
1.9%
38.18 502
 
1.8%
47.5206 473
 
1.7%
34.58 464
 
1.7%
52.37 452
 
1.6%
37.5 368
 
1.3%
165.95 307
 
1.1%
74.08 295
 
1.1%
Other values (3513) 22965
81.9%
2025-01-08T17:55:08.505398image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 28026
16.2%
7 19303
11.2%
1 16358
9.5%
- 15622
9.0%
3 14216
8.2%
5 13864
8.0%
2 12920
7.5%
6 12744
7.4%
8 12375
7.2%
9 10013
 
5.8%
Other values (2) 17530
10.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 129323
74.8%
Other Punctuation 28026
 
16.2%
Dash Punctuation 15622
 
9.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
7 19303
14.9%
1 16358
12.6%
3 14216
11.0%
5 13864
10.7%
2 12920
10.0%
6 12744
9.9%
8 12375
9.6%
9 10013
7.7%
0 8870
6.9%
4 8660
6.7%
Other Punctuation
ValueCountFrequency (%)
. 28026
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 15622
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 172971
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 28026
16.2%
7 19303
11.2%
1 16358
9.5%
- 15622
9.0%
3 14216
8.2%
5 13864
8.0%
2 12920
7.5%
6 12744
7.4%
8 12375
7.2%
9 10013
 
5.8%
Other values (2) 17530
10.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 172971
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 28026
16.2%
7 19303
11.2%
1 16358
9.5%
- 15622
9.0%
3 14216
8.2%
5 13864
8.0%
2 12920
7.5%
6 12744
7.4%
8 12375
7.2%
9 10013
 
5.8%
Other values (2) 17530
10.1%
Distinct4
Distinct (%)< 0.1%
Missing567281
Missing (%)97.0%
Memory size4.5 MiB
2025-01-08T17:55:08.563797image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length23
Median length23
Mean length22.88076945
Min length3

Characters and Unicode

Total characters396089
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDegrees Minutes Seconds
2nd rowDegrees Minutes Seconds
3rd rowDegrees Minutes Seconds
4th rowDegrees Minutes Seconds
5th rowDegrees Minutes Seconds
ValueCountFrequency (%)
degrees 17208
33.3%
minutes 17206
33.3%
seconds 17206
33.3%
utm 100
 
0.2%
unknown 3
 
< 0.1%
decimal 2
 
< 0.1%
2025-01-08T17:55:08.668305image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 86038
21.7%
s 51620
13.0%
n 34421
 
8.7%
34414
 
8.7%
M 17306
 
4.4%
o 17209
 
4.3%
D 17208
 
4.3%
c 17208
 
4.3%
g 17208
 
4.3%
r 17208
 
4.3%
Other values (12) 86249
21.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 309752
78.2%
Uppercase Letter 51923
 
13.1%
Space Separator 34414
 
8.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 86038
27.8%
s 51620
16.7%
n 34421
11.1%
o 17209
 
5.6%
c 17208
 
5.6%
g 17208
 
5.6%
r 17208
 
5.6%
i 17208
 
5.6%
d 17208
 
5.6%
t 17206
 
5.6%
Other values (6) 17218
 
5.6%
Uppercase Letter
ValueCountFrequency (%)
M 17306
33.3%
D 17208
33.1%
S 17206
33.1%
U 103
 
0.2%
T 100
 
0.2%
Space Separator
ValueCountFrequency (%)
34414
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 361675
91.3%
Common 34414
 
8.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 86038
23.8%
s 51620
14.3%
n 34421
9.5%
M 17306
 
4.8%
o 17209
 
4.8%
D 17208
 
4.8%
c 17208
 
4.8%
g 17208
 
4.8%
r 17208
 
4.8%
i 17208
 
4.8%
Other values (11) 69041
19.1%
Common
ValueCountFrequency (%)
34414
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 396089
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 86038
21.7%
s 51620
13.0%
n 34421
 
8.7%
34414
 
8.7%
M 17306
 
4.4%
o 17209
 
4.3%
D 17208
 
4.3%
c 17208
 
4.3%
g 17208
 
4.3%
r 17208
 
4.3%
Other values (12) 86249
21.8%

georeferenceProtocol
Text

Missing 

Distinct11
Distinct (%)0.9%
Missing583342
Missing (%)99.8%
Memory size4.5 MiB
2025-01-08T17:55:08.721147image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length21
Median length3
Mean length7.1184
Min length3

Characters and Unicode

Total characters8898
Distinct characters33
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)0.2%

Sample

1st rowGEOLocate tool
2nd rowGPS
3rd rowGoogle Earth maps
4th rowGPS
5th rowGPS
ValueCountFrequency (%)
gps 739
39.4%
earth 195
 
10.4%
maps 195
 
10.4%
google 195
 
10.4%
geolocate 179
 
9.6%
tool 179
 
9.6%
map 109
 
5.8%
online 18
 
1.0%
recorded 15
 
0.8%
not 15
 
0.8%
Other values (7) 35
 
1.9%
2025-01-08T17:55:08.827677image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
G 1114
12.5%
o 988
11.1%
P 739
 
8.3%
S 739
 
8.3%
a 700
 
7.9%
624
 
7.0%
t 582
 
6.5%
e 436
 
4.9%
l 413
 
4.6%
E 374
 
4.2%
Other values (23) 2189
24.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4838
54.4%
Uppercase Letter 3436
38.6%
Space Separator 624
 
7.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 988
20.4%
a 700
14.5%
t 582
12.0%
e 436
9.0%
l 413
8.5%
p 306
 
6.3%
m 244
 
5.0%
r 237
 
4.9%
c 205
 
4.2%
g 195
 
4.0%
Other values (12) 532
11.0%
Uppercase Letter
ValueCountFrequency (%)
G 1114
32.4%
P 739
21.5%
S 739
21.5%
E 374
 
10.9%
O 179
 
5.2%
L 179
 
5.2%
M 81
 
2.4%
U 11
 
0.3%
C 10
 
0.3%
T 10
 
0.3%
Space Separator
ValueCountFrequency (%)
624
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 8274
93.0%
Common 624
 
7.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
G 1114
13.5%
o 988
11.9%
P 739
 
8.9%
S 739
 
8.9%
a 700
 
8.5%
t 582
 
7.0%
e 436
 
5.3%
l 413
 
5.0%
E 374
 
4.5%
p 306
 
3.7%
Other values (22) 1883
22.8%
Common
ValueCountFrequency (%)
624
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8898
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
G 1114
12.5%
o 988
11.1%
P 739
 
8.3%
S 739
 
8.3%
a 700
 
7.9%
624
 
7.0%
t 582
 
6.5%
e 436
 
4.9%
l 413
 
4.6%
E 374
 
4.2%
Other values (23) 2189
24.6%
Distinct5
Distinct (%)0.7%
Missing583894
Missing (%)99.9%
Memory size4.5 MiB
2025-01-08T17:55:08.871920image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length12
Median length9
Mean length8.736389685
Min length3

Characters and Unicode

Total characters6098
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st rowuncertain
2nd rowuncertain
3rd rowuncertain
4th rowuncertain
5th rowuncertain
ValueCountFrequency (%)
uncertain 663
94.3%
cf 29
 
4.1%
sp 4
 
0.6%
aff 4
 
0.6%
near 2
 
0.3%
vel 1
 
0.1%
2025-01-08T17:55:08.964522image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 1328
21.8%
c 692
11.3%
a 669
11.0%
e 666
10.9%
r 665
10.9%
u 663
10.9%
t 663
10.9%
i 663
10.9%
f 37
 
0.6%
. 37
 
0.6%
Other values (5) 15
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 6056
99.3%
Other Punctuation 37
 
0.6%
Space Separator 5
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 1328
21.9%
c 692
11.4%
a 669
11.0%
e 666
11.0%
r 665
11.0%
u 663
10.9%
t 663
10.9%
i 663
10.9%
f 37
 
0.6%
s 4
 
0.1%
Other values (3) 6
 
0.1%
Other Punctuation
ValueCountFrequency (%)
. 37
100.0%
Space Separator
ValueCountFrequency (%)
5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6056
99.3%
Common 42
 
0.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 1328
21.9%
c 692
11.4%
a 669
11.0%
e 666
11.0%
r 665
11.0%
u 663
10.9%
t 663
10.9%
i 663
10.9%
f 37
 
0.6%
s 4
 
0.1%
Other values (3) 6
 
0.1%
Common
ValueCountFrequency (%)
. 37
88.1%
5
 
11.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6098
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 1328
21.8%
c 692
11.3%
a 669
11.0%
e 666
10.9%
r 665
10.9%
u 663
10.9%
t 663
10.9%
i 663
10.9%
f 37
 
0.6%
. 37
 
0.6%
Other values (5) 15
 
0.2%

typeStatus
Text

Missing 

Distinct3
Distinct (%)0.1%
Missing580632
Missing (%)99.3%
Memory size4.5 MiB
2025-01-08T17:55:09.003524image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length9
Median length4
Mean length4.607323232
Min length4

Characters and Unicode

Total characters18245
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowCOTYPE
2nd rowTYPE
3rd rowTYPE
4th rowTYPE
5th rowTYPE
ValueCountFrequency (%)
type 2759
69.7%
cotype 1200
30.3%
lectotype 1
 
< 0.1%
2025-01-08T17:55:09.095446image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
T 3961
21.7%
E 3961
21.7%
Y 3960
21.7%
P 3960
21.7%
C 1201
 
6.6%
O 1201
 
6.6%
L 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 18245
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 3961
21.7%
E 3961
21.7%
Y 3960
21.7%
P 3960
21.7%
C 1201
 
6.6%
O 1201
 
6.6%
L 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 18245
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 3961
21.7%
E 3961
21.7%
Y 3960
21.7%
P 3960
21.7%
C 1201
 
6.6%
O 1201
 
6.6%
L 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 18245
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
T 3961
21.7%
E 3961
21.7%
Y 3960
21.7%
P 3960
21.7%
C 1201
 
6.6%
O 1201
 
6.6%
L 1
 
< 0.1%

identifiedBy
Text

Missing 

Distinct69
Distinct (%)2.0%
Missing581206
Missing (%)99.4%
Memory size4.5 MiB
2025-01-08T17:55:09.207430image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length129
Median length18
Mean length24.97489663
Min length9

Characters and Unicode

Total characters84565
Distinct characters60
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22 ?
Unique (%)0.6%

Sample

1st rowWetmore, Alexander
2nd rowMaley, James M, Collections Manager, Occidental College - Moore Laboratory of Zoology (UNITED STATES)
3rd rowWetmore, Alexander
4th rowVerhelst, Juan C
5th rowClark, W. S.
ValueCountFrequency (%)
wetmore 2393
21.9%
alexander 2382
21.8%
of 294
 
2.7%
268
 
2.5%
united 266
 
2.4%
states 265
 
2.4%
museum 246
 
2.3%
history 200
 
1.8%
natural 200
 
1.8%
birds 198
 
1.8%
Other values (178) 4219
38.6%
2025-01-08T17:55:09.387877image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 11582
13.7%
7545
 
8.9%
r 6594
 
7.8%
a 5098
 
6.0%
o 5033
 
6.0%
t 4517
 
5.3%
n 4224
 
5.0%
l 4082
 
4.8%
, 3962
 
4.7%
m 3194
 
3.8%
Other values (50) 28734
34.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 57333
67.8%
Uppercase Letter 13878
 
16.4%
Space Separator 7545
 
8.9%
Other Punctuation 4577
 
5.4%
Close Punctuation 477
 
0.6%
Open Punctuation 477
 
0.6%
Dash Punctuation 270
 
0.3%
Decimal Number 8
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 11582
20.2%
r 6594
11.5%
a 5098
8.9%
o 5033
8.8%
t 4517
 
7.9%
n 4224
 
7.4%
l 4082
 
7.1%
m 3194
 
5.6%
d 2638
 
4.6%
x 2382
 
4.2%
Other values (16) 7989
13.9%
Uppercase Letter
ValueCountFrequency (%)
A 2751
19.8%
W 2686
19.4%
S 1048
 
7.6%
I 824
 
5.9%
T 808
 
5.8%
M 697
 
5.0%
N 690
 
5.0%
C 642
 
4.6%
D 607
 
4.4%
E 558
 
4.0%
Other values (14) 2567
18.5%
Decimal Number
ValueCountFrequency (%)
1 2
25.0%
9 2
25.0%
5 2
25.0%
0 2
25.0%
Other Punctuation
ValueCountFrequency (%)
, 3962
86.6%
. 615
 
13.4%
Space Separator
ValueCountFrequency (%)
7545
100.0%
Close Punctuation
ValueCountFrequency (%)
) 477
100.0%
Open Punctuation
ValueCountFrequency (%)
( 477
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 270
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 71211
84.2%
Common 13354
 
15.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 11582
16.3%
r 6594
 
9.3%
a 5098
 
7.2%
o 5033
 
7.1%
t 4517
 
6.3%
n 4224
 
5.9%
l 4082
 
5.7%
m 3194
 
4.5%
A 2751
 
3.9%
W 2686
 
3.8%
Other values (40) 21450
30.1%
Common
ValueCountFrequency (%)
7545
56.5%
, 3962
29.7%
. 615
 
4.6%
) 477
 
3.6%
( 477
 
3.6%
- 270
 
2.0%
1 2
 
< 0.1%
9 2
 
< 0.1%
5 2
 
< 0.1%
0 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 84564
> 99.9%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 11582
13.7%
7545
 
8.9%
r 6594
 
7.8%
a 5098
 
6.0%
o 5033
 
6.0%
t 4517
 
5.3%
n 4224
 
5.0%
l 4082
 
4.8%
, 3962
 
4.7%
m 3194
 
3.8%
Other values (49) 28733
34.0%
None
ValueCountFrequency (%)
à 1
100.0%
Distinct18485
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:09.589780image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length8
Median length7
Mean length7.009553672
Min length3

Characters and Unicode

Total characters4097729
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2480 ?
Unique (%)0.4%

Sample

1st row2492087
2nd row2480415
3rd row2481705
4th row9367409
5th row5229959
ValueCountFrequency (%)
9409198 2991
 
0.5%
7192429 1918
 
0.3%
7191991 1808
 
0.3%
9685907 1565
 
0.3%
9791464 1425
 
0.2%
7341805 1363
 
0.2%
2489985 1286
 
0.2%
5231142 1245
 
0.2%
2473421 1244
 
0.2%
2489670 1187
 
0.2%
Other values (18475) 568560
97.3%
2025-01-08T17:55:09.843624image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 550899
13.4%
4 487888
11.9%
1 454466
11.1%
9 450768
11.0%
7 438367
10.7%
8 395864
9.7%
6 370734
9.0%
0 331876
8.1%
5 312473
7.6%
3 304394
7.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4097729
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 550899
13.4%
4 487888
11.9%
1 454466
11.1%
9 450768
11.0%
7 438367
10.7%
8 395864
9.7%
6 370734
9.0%
0 331876
8.1%
5 312473
7.6%
3 304394
7.4%

Most occurring scripts

ValueCountFrequency (%)
Common 4097729
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 550899
13.4%
4 487888
11.9%
1 454466
11.1%
9 450768
11.0%
7 438367
10.7%
8 395864
9.7%
6 370734
9.0%
0 331876
8.1%
5 312473
7.6%
3 304394
7.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4097729
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 550899
13.4%
4 487888
11.9%
1 454466
11.1%
9 450768
11.0%
7 438367
10.7%
8 395864
9.7%
6 370734
9.0%
0 331876
8.1%
5 312473
7.6%
3 304394
7.4%
Distinct18875
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:10.035588image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length101
Median length68
Mean length36.39443065
Min length4

Characters and Unicode

Total characters21275893
Distinct characters78
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2553 ?
Unique (%)0.4%

Sample

1st rowParoaria capitata (d'Orbigny & Lafresnaye, 1837)
2nd rowRostrhamus sociabilis (Vieillot, 1817)
3rd rowBartramia longicauda (Bechstein, 1812)
4th rowSterna hirundo Linnaeus, 1758
5th rowPrionochilus plateni W.Blasius, 1888
ValueCountFrequency (%)
linnaeus 95179
 
3.9%
1758 62131
 
2.5%
1766 31804
 
1.3%
1789 23736
 
1.0%
21524
 
0.9%
vieillot 20514
 
0.8%
j.f.gmelin 17875
 
0.7%
ridgway 14989
 
0.6%
dendroica 14825
 
0.6%
gmelin 12921
 
0.5%
Other values (11256) 2141359
87.2%
2025-01-08T17:55:10.284109image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1872265
 
8.8%
a 1760824
 
8.3%
i 1561286
 
7.3%
s 1382114
 
6.5%
e 1247644
 
5.9%
n 1111280
 
5.2%
r 1081068
 
5.1%
u 984426
 
4.6%
l 968653
 
4.6%
o 962179
 
4.5%
Other values (68) 8344154
39.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 15008719
70.5%
Decimal Number 1926600
 
9.1%
Space Separator 1872265
 
8.8%
Uppercase Letter 1240000
 
5.8%
Other Punctuation 643869
 
3.0%
Open Punctuation 290919
 
1.4%
Close Punctuation 290919
 
1.4%
Dash Punctuation 2602
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1760824
11.7%
i 1561286
10.4%
s 1382114
9.2%
e 1247644
 
8.3%
n 1111280
 
7.4%
r 1081068
 
7.2%
u 984426
 
6.6%
l 968653
 
6.5%
o 962179
 
6.4%
t 711082
 
4.7%
Other values (23) 3238163
21.6%
Uppercase Letter
ValueCountFrequency (%)
L 162966
13.1%
P 118987
 
9.6%
S 116593
 
9.4%
C 116417
 
9.4%
G 77102
 
6.2%
A 76812
 
6.2%
B 72288
 
5.8%
T 64274
 
5.2%
M 63743
 
5.1%
R 48447
 
3.9%
Other values (17) 322371
26.0%
Decimal Number
ValueCountFrequency (%)
1 566928
29.4%
8 422023
21.9%
7 236330
12.3%
9 153919
 
8.0%
6 130112
 
6.8%
5 120708
 
6.3%
3 84592
 
4.4%
2 76423
 
4.0%
0 68473
 
3.6%
4 67092
 
3.5%
Other Punctuation
ValueCountFrequency (%)
, 481794
74.8%
. 139512
 
21.7%
& 21524
 
3.3%
' 1039
 
0.2%
Space Separator
ValueCountFrequency (%)
1872265
100.0%
Open Punctuation
ValueCountFrequency (%)
( 290919
100.0%
Close Punctuation
ValueCountFrequency (%)
) 290919
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2602
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 16248719
76.4%
Common 5027174
 
23.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1760824
 
10.8%
i 1561286
 
9.6%
s 1382114
 
8.5%
e 1247644
 
7.7%
n 1111280
 
6.8%
r 1081068
 
6.7%
u 984426
 
6.1%
l 968653
 
6.0%
o 962179
 
5.9%
t 711082
 
4.4%
Other values (50) 4478163
27.6%
Common
ValueCountFrequency (%)
1872265
37.2%
1 566928
 
11.3%
, 481794
 
9.6%
8 422023
 
8.4%
( 290919
 
5.8%
) 290919
 
5.8%
7 236330
 
4.7%
9 153919
 
3.1%
. 139512
 
2.8%
6 130112
 
2.6%
Other values (8) 442453
 
8.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21269847
> 99.9%
None 6046
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1872265
 
8.8%
a 1760824
 
8.3%
i 1561286
 
7.3%
s 1382114
 
6.5%
e 1247644
 
5.9%
n 1111280
 
5.2%
r 1081068
 
5.1%
u 984426
 
4.6%
l 968653
 
4.6%
o 962179
 
4.5%
Other values (60) 8338108
39.2%
None
ValueCountFrequency (%)
ü 4335
71.7%
é 883
 
14.6%
á 360
 
6.0%
è 250
 
4.1%
ö 103
 
1.7%
ä 90
 
1.5%
É 17
 
0.3%
ø 8
 
0.1%
Distinct185
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:10.446662image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length89
Median length78
Mean length65.97973972
Min length45

Characters and Unicode

Total characters38571228
Distinct characters47
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowAnimalia, Chordata, Vertebrata, Aves, Passeriformes, Emberizidae, Emberizinae
2nd rowAnimalia, Chordata, Vertebrata, Aves, Falconiformes, Accipitridae
3rd rowAnimalia, Chordata, Vertebrata, Aves, Charadriiformes, Scolopacidae
4th rowAnimalia, Chordata, Vertebrata, Aves, Charadriiformes, Laridae
5th rowAnimalia, Chordata, Vertebrata, Aves, Passeriformes, Dicaeidae
ValueCountFrequency (%)
animalia 584592
16.0%
aves 584592
16.0%
chordata 584592
16.0%
vertebrata 584592
16.0%
passeriformes 372479
10.2%
emberizidae 72754
 
2.0%
emberizinae 50573
 
1.4%
charadriiformes 44080
 
1.2%
parulidae 36362
 
1.0%
tyrannidae 27497
 
0.8%
Other values (206) 702489
19.3%
2025-01-08T17:55:10.670471image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 5107701
13.2%
e 3704731
 
9.6%
r 3367631
 
8.7%
3060010
 
7.9%
, 3060009
 
7.9%
i 3035379
 
7.9%
s 1981990
 
5.1%
t 1944511
 
5.0%
o 1467327
 
3.8%
m 1357393
 
3.5%
Other values (37) 10484546
27.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 28806607
74.7%
Uppercase Letter 3644601
 
9.4%
Space Separator 3060010
 
7.9%
Other Punctuation 3060010
 
7.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 5107701
17.7%
e 3704731
12.9%
r 3367631
11.7%
i 3035379
10.5%
s 1981990
 
6.9%
t 1944511
 
6.8%
o 1467327
 
5.1%
m 1357393
 
4.7%
d 1347875
 
4.7%
n 998616
 
3.5%
Other values (13) 4493453
15.6%
Uppercase Letter
ValueCountFrequency (%)
A 1269397
34.8%
C 741968
20.4%
V 592119
16.2%
P 536883
14.7%
E 129534
 
3.6%
T 114606
 
3.1%
S 71084
 
2.0%
F 49758
 
1.4%
M 26183
 
0.7%
G 25043
 
0.7%
Other values (11) 88026
 
2.4%
Other Punctuation
ValueCountFrequency (%)
, 3060009
> 99.9%
? 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
3060010
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 32451208
84.1%
Common 6120020
 
15.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 5107701
15.7%
e 3704731
11.4%
r 3367631
10.4%
i 3035379
 
9.4%
s 1981990
 
6.1%
t 1944511
 
6.0%
o 1467327
 
4.5%
m 1357393
 
4.2%
d 1347875
 
4.2%
A 1269397
 
3.9%
Other values (34) 7867273
24.2%
Common
ValueCountFrequency (%)
3060010
50.0%
, 3060009
50.0%
? 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 38571228
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 5107701
13.2%
e 3704731
 
9.6%
r 3367631
 
8.7%
3060010
 
7.9%
, 3060009
 
7.9%
i 3035379
 
7.9%
s 1981990
 
5.1%
t 1944511
 
5.0%
o 1467327
 
3.8%
m 1357393
 
3.5%
Other values (37) 10484546
27.2%

kingdom
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:10.720129image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters4676736
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAnimalia
2nd rowAnimalia
3rd rowAnimalia
4th rowAnimalia
5th rowAnimalia
ValueCountFrequency (%)
animalia 584592
100.0%
2025-01-08T17:55:10.806615image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 1169184
25.0%
a 1169184
25.0%
A 584592
12.5%
n 584592
12.5%
m 584592
12.5%
l 584592
12.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4092144
87.5%
Uppercase Letter 584592
 
12.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 1169184
28.6%
a 1169184
28.6%
n 584592
14.3%
m 584592
14.3%
l 584592
14.3%
Uppercase Letter
ValueCountFrequency (%)
A 584592
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4676736
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 1169184
25.0%
a 1169184
25.0%
A 584592
12.5%
n 584592
12.5%
m 584592
12.5%
l 584592
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4676736
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 1169184
25.0%
a 1169184
25.0%
A 584592
12.5%
n 584592
12.5%
m 584592
12.5%
l 584592
12.5%

phylum
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing5
Missing (%)< 0.1%
Memory size4.5 MiB
2025-01-08T17:55:10.843808image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters4676696
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowChordata
2nd rowChordata
3rd rowChordata
4th rowChordata
5th rowChordata
ValueCountFrequency (%)
chordata 584587
100.0%
2025-01-08T17:55:10.932588image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 1169174
25.0%
C 584587
12.5%
h 584587
12.5%
o 584587
12.5%
r 584587
12.5%
d 584587
12.5%
t 584587
12.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4092109
87.5%
Uppercase Letter 584587
 
12.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1169174
28.6%
h 584587
14.3%
o 584587
14.3%
r 584587
14.3%
d 584587
14.3%
t 584587
14.3%
Uppercase Letter
ValueCountFrequency (%)
C 584587
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4676696
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1169174
25.0%
C 584587
12.5%
h 584587
12.5%
o 584587
12.5%
r 584587
12.5%
d 584587
12.5%
t 584587
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4676696
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 1169174
25.0%
C 584587
12.5%
h 584587
12.5%
o 584587
12.5%
r 584587
12.5%
d 584587
12.5%
t 584587
12.5%

class
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing5
Missing (%)< 0.1%
Memory size4.5 MiB
2025-01-08T17:55:10.972593image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters2338348
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAves
2nd rowAves
3rd rowAves
4th rowAves
5th rowAves
ValueCountFrequency (%)
aves 584587
100.0%
2025-01-08T17:55:11.061049image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
A 584587
25.0%
v 584587
25.0%
e 584587
25.0%
s 584587
25.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1753761
75.0%
Uppercase Letter 584587
 
25.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
v 584587
33.3%
e 584587
33.3%
s 584587
33.3%
Uppercase Letter
ValueCountFrequency (%)
A 584587
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2338348
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 584587
25.0%
v 584587
25.0%
e 584587
25.0%
s 584587
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2338348
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 584587
25.0%
v 584587
25.0%
e 584587
25.0%
s 584587
25.0%

order
Text

Distinct42
Distinct (%)< 0.1%
Missing20
Missing (%)< 0.1%
Memory size4.5 MiB
2025-01-08T17:55:11.122733image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length19
Median length13
Mean length12.96889006
Min length10

Characters and Unicode

Total characters7581250
Distinct characters34
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPasseriformes
2nd rowAccipitriformes
3rd rowCharadriiformes
4th rowCharadriiformes
5th rowPasseriformes
ValueCountFrequency (%)
passeriformes 372474
63.7%
charadriiformes 44387
 
7.6%
piciformes 22599
 
3.9%
apodiformes 18185
 
3.1%
anseriformes 15668
 
2.7%
galliformes 14813
 
2.5%
columbiformes 12800
 
2.2%
accipitriformes 11414
 
2.0%
coraciiformes 7822
 
1.3%
psittaciformes 7419
 
1.3%
Other values (32) 56991
 
9.7%
2025-01-08T17:55:11.238736image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
s 1353697
17.9%
r 1116871
14.7%
e 999308
13.2%
i 716230
9.4%
o 644987
8.5%
m 602388
7.9%
f 584572
7.7%
a 516843
 
6.8%
P 419526
 
5.5%
c 90235
 
1.2%
Other values (24) 536593
 
7.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 6996678
92.3%
Uppercase Letter 584572
 
7.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 1353697
19.3%
r 1116871
16.0%
e 999308
14.3%
i 716230
10.2%
o 644987
9.2%
m 602388
8.6%
f 584572
8.4%
a 516843
 
7.4%
c 90235
 
1.3%
l 82324
 
1.2%
Other values (10) 289223
 
4.1%
Uppercase Letter
ValueCountFrequency (%)
P 419526
71.8%
C 75814
 
13.0%
A 45305
 
7.8%
G 22108
 
3.8%
S 11813
 
2.0%
F 4459
 
0.8%
T 3057
 
0.5%
B 1625
 
0.3%
M 348
 
0.1%
O 237
 
< 0.1%
Other values (4) 280
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 7581250
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 1353697
17.9%
r 1116871
14.7%
e 999308
13.2%
i 716230
9.4%
o 644987
8.5%
m 602388
7.9%
f 584572
7.7%
a 516843
 
6.8%
P 419526
 
5.5%
c 90235
 
1.2%
Other values (24) 536593
 
7.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7581250
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 1353697
17.9%
r 1116871
14.7%
e 999308
13.2%
i 716230
9.4%
o 644987
8.5%
m 602388
7.9%
f 584572
7.7%
a 516843
 
6.8%
P 419526
 
5.5%
c 90235
 
1.2%
Other values (24) 536593
 
7.1%

family
Text

Distinct239
Distinct (%)< 0.1%
Missing17
Missing (%)< 0.1%
Memory size4.5 MiB
2025-01-08T17:55:11.377347image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length18
Median length16
Mean length10.42056366
Min length7

Characters and Unicode

Total characters6091601
Distinct characters42
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowThraupidae
2nd rowAccipitridae
3rd rowScolopacidae
4th rowLaridae
5th rowDicaeidae
ValueCountFrequency (%)
passerellidae 39435
 
6.7%
parulidae 34481
 
5.9%
tyrannidae 26165
 
4.5%
icteridae 19964
 
3.4%
thraupidae 18114
 
3.1%
picidae 17391
 
3.0%
fringillidae 17014
 
2.9%
scolopacidae 16651
 
2.8%
turdidae 16039
 
2.7%
anatidae 15579
 
2.7%
Other values (229) 363742
62.2%
2025-01-08T17:55:11.578583image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 939318
15.4%
i 882003
14.5%
e 764759
12.6%
d 672252
11.0%
r 396820
 
6.5%
l 338348
 
5.6%
c 231479
 
3.8%
o 225694
 
3.7%
n 207894
 
3.4%
t 157040
 
2.6%
Other values (32) 1275994
20.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5507026
90.4%
Uppercase Letter 584575
 
9.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 939318
17.1%
i 882003
16.0%
e 764759
13.9%
d 672252
12.2%
r 396820
7.2%
l 338348
 
6.1%
c 231479
 
4.2%
o 225694
 
4.1%
n 207894
 
3.8%
t 157040
 
2.9%
Other values (11) 691419
12.6%
Uppercase Letter
ValueCountFrequency (%)
P 154750
26.5%
T 101314
17.3%
C 66304
11.3%
A 52928
 
9.1%
S 37881
 
6.5%
M 33527
 
5.7%
F 30447
 
5.2%
L 22426
 
3.8%
I 20551
 
3.5%
R 11420
 
2.0%
Other values (11) 53027
 
9.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 6091601
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 939318
15.4%
i 882003
14.5%
e 764759
12.6%
d 672252
11.0%
r 396820
 
6.5%
l 338348
 
5.6%
c 231479
 
3.8%
o 225694
 
3.7%
n 207894
 
3.4%
t 157040
 
2.6%
Other values (32) 1275994
20.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6091601
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 939318
15.4%
i 882003
14.5%
e 764759
12.6%
d 672252
11.0%
r 396820
 
6.5%
l 338348
 
5.6%
c 231479
 
3.8%
o 225694
 
3.7%
n 207894
 
3.4%
t 157040
 
2.6%
Other values (32) 1275994
20.9%

genus
Text

Distinct2196
Distinct (%)0.4%
Missing338
Missing (%)0.1%
Memory size4.5 MiB
2025-01-08T17:55:11.757521image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length18
Median length15
Mean length8.640262626
Min length3

Characters and Unicode

Total characters5048108
Distinct characters52
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique84 ?
Unique (%)< 0.1%

Sample

1st rowParoaria
2nd rowRostrhamus
3rd rowBartramia
4th rowSterna
5th rowPrionochilus
ValueCountFrequency (%)
setophaga 18301
 
3.1%
melospiza 7103
 
1.2%
turdus 6838
 
1.2%
calidris 6684
 
1.1%
vireo 6403
 
1.1%
agelaius 5379
 
0.9%
catharus 4885
 
0.8%
junco 4780
 
0.8%
geothlypis 4423
 
0.8%
zonotrichia 4075
 
0.7%
Other values (2186) 515383
88.2%
2025-01-08T17:55:11.995302image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 525335
 
10.4%
o 417081
 
8.3%
i 391104
 
7.7%
s 389715
 
7.7%
r 334572
 
6.6%
e 326852
 
6.5%
u 295820
 
5.9%
l 272317
 
5.4%
t 236060
 
4.7%
n 223789
 
4.4%
Other values (42) 1635463
32.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4463854
88.4%
Uppercase Letter 584254
 
11.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 525335
11.8%
o 417081
 
9.3%
i 391104
 
8.8%
s 389715
 
8.7%
r 334572
 
7.5%
e 326852
 
7.3%
u 295820
 
6.6%
l 272317
 
6.1%
t 236060
 
5.3%
n 223789
 
5.0%
Other values (16) 1051209
23.5%
Uppercase Letter
ValueCountFrequency (%)
C 89334
15.3%
P 84753
14.5%
S 65896
11.3%
A 55035
9.4%
M 47839
8.2%
T 41913
 
7.2%
L 30152
 
5.2%
E 23489
 
4.0%
G 17415
 
3.0%
H 17156
 
2.9%
Other values (16) 111272
19.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5048108
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 525335
 
10.4%
o 417081
 
8.3%
i 391104
 
7.7%
s 389715
 
7.7%
r 334572
 
6.6%
e 326852
 
6.5%
u 295820
 
5.9%
l 272317
 
5.4%
t 236060
 
4.7%
n 223789
 
4.4%
Other values (42) 1635463
32.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5048108
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 525335
 
10.4%
o 417081
 
8.3%
i 391104
 
7.7%
s 389715
 
7.7%
r 334572
 
6.6%
e 326852
 
6.5%
u 295820
 
5.9%
l 272317
 
5.4%
t 236060
 
4.7%
n 223789
 
4.4%
Other values (42) 1635463
32.4%
Distinct2024
Distinct (%)0.3%
Missing495
Missing (%)0.1%
Memory size4.5 MiB
2025-01-08T17:55:12.145147image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length18
Median length15
Mean length8.461623669
Min length3

Characters and Unicode

Total characters4942409
Distinct characters52
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique81 ?
Unique (%)< 0.1%

Sample

1st rowParoaria
2nd rowRostrhamus
3rd rowBartramia
4th rowSterna
5th rowPrionochilus
ValueCountFrequency (%)
dendroica 14825
 
2.5%
parus 7485
 
1.3%
melospiza 7103
 
1.2%
turdus 6813
 
1.2%
vireo 6403
 
1.1%
calidris 6372
 
1.1%
sterna 6184
 
1.1%
agelaius 5525
 
0.9%
carduelis 5507
 
0.9%
picoides 5086
 
0.9%
Other values (2014) 512794
87.8%
2025-01-08T17:55:12.356404image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 519400
 
10.5%
i 397930
 
8.1%
o 386413
 
7.8%
s 382857
 
7.7%
r 365575
 
7.4%
u 308704
 
6.2%
e 306794
 
6.2%
l 267331
 
5.4%
n 223958
 
4.5%
c 212647
 
4.3%
Other values (42) 1570800
31.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4358312
88.2%
Uppercase Letter 584097
 
11.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 519400
11.9%
i 397930
9.1%
o 386413
 
8.9%
s 382857
 
8.8%
r 365575
 
8.4%
u 308704
 
7.1%
e 306794
 
7.0%
l 267331
 
6.1%
n 223958
 
5.1%
c 212647
 
4.9%
Other values (16) 986703
22.6%
Uppercase Letter
ValueCountFrequency (%)
C 92459
15.8%
P 87930
15.1%
A 56987
9.8%
S 48616
8.3%
M 44863
 
7.7%
T 42410
 
7.3%
D 28038
 
4.8%
L 25718
 
4.4%
E 22717
 
3.9%
G 16758
 
2.9%
Other values (16) 117601
20.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 4942409
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 519400
 
10.5%
i 397930
 
8.1%
o 386413
 
7.8%
s 382857
 
7.7%
r 365575
 
7.4%
u 308704
 
6.2%
e 306794
 
6.2%
l 267331
 
5.4%
n 223958
 
4.5%
c 212647
 
4.3%
Other values (42) 1570800
31.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4942409
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 519400
 
10.5%
i 397930
 
8.1%
o 386413
 
7.8%
s 382857
 
7.7%
r 365575
 
7.4%
u 308704
 
6.2%
e 306794
 
6.2%
l 267331
 
5.4%
n 223958
 
4.5%
c 212647
 
4.3%
Other values (42) 1570800
31.8%

specificEpithet
Text

Missing 

Distinct4643
Distinct (%)0.8%
Missing7917
Missing (%)1.4%
Memory size4.5 MiB
2025-01-08T17:55:12.546624image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length21
Median length16
Mean length8.786944119
Min length3

Characters and Unicode

Total characters5067211
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique322 ?
Unique (%)0.1%

Sample

1st rowcapitata
2nd rowsociabilis
3rd rowlongicauda
4th rowhirundo
5th rowplateni
ValueCountFrequency (%)
melodia 5111
 
0.9%
phoeniceus 4986
 
0.9%
hyemalis 4880
 
0.8%
americana 4671
 
0.8%
canadensis 3833
 
0.7%
sandwichensis 3774
 
0.7%
pusilla 3572
 
0.6%
alpestris 3345
 
0.6%
olivaceus 3301
 
0.6%
carolinensis 3295
 
0.6%
Other values (4633) 535907
92.9%
2025-01-08T17:55:12.798008image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 623579
12.3%
i 556345
11.0%
s 506337
10.0%
u 361837
 
7.1%
r 360464
 
7.1%
e 352637
 
7.0%
l 331573
 
6.5%
n 305381
 
6.0%
c 304652
 
6.0%
o 272602
 
5.4%
Other values (16) 1091804
21.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5067211
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 623579
12.3%
i 556345
11.0%
s 506337
10.0%
u 361837
 
7.1%
r 360464
 
7.1%
e 352637
 
7.0%
l 331573
 
6.5%
n 305381
 
6.0%
c 304652
 
6.0%
o 272602
 
5.4%
Other values (16) 1091804
21.5%

Most occurring scripts

ValueCountFrequency (%)
Latin 5067211
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 623579
12.3%
i 556345
11.0%
s 506337
10.0%
u 361837
 
7.1%
r 360464
 
7.1%
e 352637
 
7.0%
l 331573
 
6.5%
n 305381
 
6.0%
c 304652
 
6.0%
o 272602
 
5.4%
Other values (16) 1091804
21.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5067211
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 623579
12.3%
i 556345
11.0%
s 506337
10.0%
u 361837
 
7.1%
r 360464
 
7.1%
e 352637
 
7.0%
l 331573
 
6.5%
n 305381
 
6.0%
c 304652
 
6.0%
o 272602
 
5.4%
Other values (16) 1091804
21.5%

infraspecificEpithet
Text

Missing 

Distinct6225
Distinct (%)2.3%
Missing308675
Missing (%)52.8%
Memory size4.5 MiB
2025-01-08T17:55:12.947214image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length18
Median length16
Mean length8.918026073
Min length2

Characters and Unicode

Total characters2460635
Distinct characters27
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique702 ?
Unique (%)0.3%

Sample

1st rowsolitarius
2nd rowflavoolivaceus
3rd rowsatrapa
4th rowaustralis
5th rowmalherbii
ValueCountFrequency (%)
carolinensis 1803
 
0.7%
olivaceus 1259
 
0.5%
pinus 1235
 
0.4%
occidentalis 1175
 
0.4%
coronata 1165
 
0.4%
pusilla 1144
 
0.4%
flammea 1046
 
0.4%
arizonae 1029
 
0.4%
hyemalis 1005
 
0.4%
frontalis 1004
 
0.4%
Other values (6215) 264052
95.7%
2025-01-08T17:55:13.159836image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 290816
11.8%
a 278989
11.3%
s 254771
10.4%
e 190758
 
7.8%
r 173209
 
7.0%
n 170468
 
6.9%
u 157384
 
6.4%
l 149284
 
6.1%
o 136638
 
5.6%
c 130973
 
5.3%
Other values (17) 527345
21.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2460622
> 99.9%
Dash Punctuation 13
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 290816
11.8%
a 278989
11.3%
s 254771
10.4%
e 190758
 
7.8%
r 173209
 
7.0%
n 170468
 
6.9%
u 157384
 
6.4%
l 149284
 
6.1%
o 136638
 
5.6%
c 130973
 
5.3%
Other values (16) 527332
21.4%
Dash Punctuation
ValueCountFrequency (%)
- 13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2460622
> 99.9%
Common 13
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 290816
11.8%
a 278989
11.3%
s 254771
10.4%
e 190758
 
7.8%
r 173209
 
7.0%
n 170468
 
6.9%
u 157384
 
6.4%
l 149284
 
6.1%
o 136638
 
5.6%
c 130973
 
5.3%
Other values (16) 527332
21.4%
Common
ValueCountFrequency (%)
- 13
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2460635
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 290816
11.8%
a 278989
11.3%
s 254771
10.4%
e 190758
 
7.8%
r 173209
 
7.0%
n 170468
 
6.9%
u 157384
 
6.4%
l 149284
 
6.1%
o 136638
 
5.6%
c 130973
 
5.3%
Other values (17) 527345
21.4%
Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:13.214450image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length10
Median length7
Mean length8.389954019
Min length4

Characters and Unicode

Total characters4904700
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowSPECIES
2nd rowSPECIES
3rd rowSPECIES
4th rowSPECIES
5th rowSPECIES
ValueCountFrequency (%)
species 300916
51.5%
subspecies 275917
47.2%
genus 7422
 
1.3%
family 324
 
0.1%
class 12
 
< 0.1%
form 1
 
< 0.1%
2025-01-08T17:55:13.314851image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
S 1437029
29.3%
E 1161088
23.7%
I 577157
11.8%
C 576845
11.8%
P 576833
11.8%
U 283339
 
5.8%
B 275917
 
5.6%
G 7422
 
0.2%
N 7422
 
0.2%
A 336
 
< 0.1%
Other values (6) 1312
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 4904700
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 1437029
29.3%
E 1161088
23.7%
I 577157
11.8%
C 576845
11.8%
P 576833
11.8%
U 283339
 
5.8%
B 275917
 
5.6%
G 7422
 
0.2%
N 7422
 
0.2%
A 336
 
< 0.1%
Other values (6) 1312
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 4904700
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 1437029
29.3%
E 1161088
23.7%
I 577157
11.8%
C 576845
11.8%
P 576833
11.8%
U 283339
 
5.8%
B 275917
 
5.6%
G 7422
 
0.2%
N 7422
 
0.2%
A 336
 
< 0.1%
Other values (6) 1312
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4904700
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 1437029
29.3%
E 1161088
23.7%
I 577157
11.8%
C 576845
11.8%
P 576833
11.8%
U 283339
 
5.8%
B 275917
 
5.6%
G 7422
 
0.2%
N 7422
 
0.2%
A 336
 
< 0.1%
Other values (6) 1312
 
< 0.1%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:13.356351image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length8
Median length8
Mean length7.793247256
Min length7

Characters and Unicode

Total characters4555870
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowACCEPTED
2nd rowACCEPTED
3rd rowACCEPTED
4th rowACCEPTED
5th rowACCEPTED
ValueCountFrequency (%)
accepted 463081
79.2%
synonym 120866
 
20.7%
doubtful 645
 
0.1%
2025-01-08T17:55:13.447291image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 926162
20.3%
E 926162
20.3%
T 463726
10.2%
D 463726
10.2%
A 463081
10.2%
P 463081
10.2%
Y 241732
 
5.3%
N 241732
 
5.3%
O 121511
 
2.7%
S 120866
 
2.7%
Other values (5) 124091
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 4555870
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 926162
20.3%
E 926162
20.3%
T 463726
10.2%
D 463726
10.2%
A 463081
10.2%
P 463081
10.2%
Y 241732
 
5.3%
N 241732
 
5.3%
O 121511
 
2.7%
S 120866
 
2.7%
Other values (5) 124091
 
2.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 4555870
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 926162
20.3%
E 926162
20.3%
T 463726
10.2%
D 463726
10.2%
A 463081
10.2%
P 463081
10.2%
Y 241732
 
5.3%
N 241732
 
5.3%
O 121511
 
2.7%
S 120866
 
2.7%
Other values (5) 124091
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4555870
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 926162
20.3%
E 926162
20.3%
T 463726
10.2%
D 463726
10.2%
A 463081
10.2%
P 463081
10.2%
Y 241732
 
5.3%
N 241732
 
5.3%
O 121511
 
2.7%
S 120866
 
2.7%
Other values (5) 124091
 
2.7%

datasetKey
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:13.497292image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length36
Median length36
Mean length36
Min length36

Characters and Unicode

Total characters21045312
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row821cc27a-e3bb-4bc5-ac34-89ada245069d
2nd row821cc27a-e3bb-4bc5-ac34-89ada245069d
3rd row821cc27a-e3bb-4bc5-ac34-89ada245069d
4th row821cc27a-e3bb-4bc5-ac34-89ada245069d
5th row821cc27a-e3bb-4bc5-ac34-89ada245069d
ValueCountFrequency (%)
821cc27a-e3bb-4bc5-ac34-89ada245069d 584592
100.0%
2025-01-08T17:55:13.599689image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
c 2338368
11.1%
a 2338368
11.1%
- 2338368
11.1%
2 1753776
8.3%
b 1753776
8.3%
4 1753776
8.3%
8 1169184
 
5.6%
3 1169184
 
5.6%
5 1169184
 
5.6%
9 1169184
 
5.6%
Other values (6) 4092144
19.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 10522656
50.0%
Lowercase Letter 8184288
38.9%
Dash Punctuation 2338368
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 1753776
16.7%
4 1753776
16.7%
8 1169184
11.1%
3 1169184
11.1%
5 1169184
11.1%
9 1169184
11.1%
1 584592
 
5.6%
7 584592
 
5.6%
0 584592
 
5.6%
6 584592
 
5.6%
Lowercase Letter
ValueCountFrequency (%)
c 2338368
28.6%
a 2338368
28.6%
b 1753776
21.4%
d 1169184
14.3%
e 584592
 
7.1%
Dash Punctuation
ValueCountFrequency (%)
- 2338368
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 12861024
61.1%
Latin 8184288
38.9%

Most frequent character per script

Common
ValueCountFrequency (%)
- 2338368
18.2%
2 1753776
13.6%
4 1753776
13.6%
8 1169184
9.1%
3 1169184
9.1%
5 1169184
9.1%
9 1169184
9.1%
1 584592
 
4.5%
7 584592
 
4.5%
0 584592
 
4.5%
Latin
ValueCountFrequency (%)
c 2338368
28.6%
a 2338368
28.6%
b 1753776
21.4%
d 1169184
14.3%
e 584592
 
7.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21045312
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
c 2338368
11.1%
a 2338368
11.1%
- 2338368
11.1%
2 1753776
8.3%
b 1753776
8.3%
4 1753776
8.3%
8 1169184
 
5.6%
3 1169184
 
5.6%
5 1169184
 
5.6%
9 1169184
 
5.6%
Other values (6) 4092144
19.4%

publishingCountry
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:13.636646image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters1169184
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUS
2nd rowUS
3rd rowUS
4th rowUS
5th rowUS
ValueCountFrequency (%)
us 584592
100.0%
2025-01-08T17:55:13.723032image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
U 584592
50.0%
S 584592
50.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1169184
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
U 584592
50.0%
S 584592
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1169184
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
U 584592
50.0%
S 584592
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1169184
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
U 584592
50.0%
S 584592
50.0%
Distinct183965
Distinct (%)31.5%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:13.859623image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length24
Median length24
Mean length23.99608616
Min length20

Characters and Unicode

Total characters14027920
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique40346 ?
Unique (%)6.9%

Sample

1st row2024-12-02T13:56:05.137Z
2nd row2024-12-02T13:56:08.067Z
3rd row2024-12-02T13:59:48.585Z
4th row2024-12-02T13:56:09.311Z
5th row2024-12-02T13:58:24.805Z
ValueCountFrequency (%)
2024-12-02t13:57:59.341z 17
 
< 0.1%
2024-12-02t13:57:45.007z 16
 
< 0.1%
2024-12-02t13:57:38.028z 16
 
< 0.1%
2024-12-02t13:57:53.841z 16
 
< 0.1%
2024-12-02t13:57:44.964z 15
 
< 0.1%
2024-12-02t13:58:02.321z 15
 
< 0.1%
2024-12-02t13:57:53.332z 15
 
< 0.1%
2024-12-02t13:57:51.208z 15
 
< 0.1%
2024-12-02t13:58:02.659z 15
 
< 0.1%
2024-12-02t13:57:41.116z 15
 
< 0.1%
Other values (183955) 584437
> 99.9%
2025-01-08T17:55:14.176903image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 2670609
19.0%
0 1482175
10.6%
1 1475322
10.5%
- 1169184
8.3%
: 1169184
8.3%
4 940152
 
6.7%
5 927783
 
6.6%
3 925430
 
6.6%
T 584592
 
4.2%
Z 584592
 
4.2%
Other values (5) 2098897
15.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 9936348
70.8%
Other Punctuation 1753204
 
12.5%
Dash Punctuation 1169184
 
8.3%
Uppercase Letter 1169184
 
8.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 2670609
26.9%
0 1482175
14.9%
1 1475322
14.8%
4 940152
 
9.5%
5 927783
 
9.3%
3 925430
 
9.3%
7 449478
 
4.5%
9 373966
 
3.8%
6 351326
 
3.5%
8 340107
 
3.4%
Other Punctuation
ValueCountFrequency (%)
: 1169184
66.7%
. 584020
33.3%
Uppercase Letter
ValueCountFrequency (%)
T 584592
50.0%
Z 584592
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 1169184
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 12858736
91.7%
Latin 1169184
 
8.3%

Most frequent character per script

Common
ValueCountFrequency (%)
2 2670609
20.8%
0 1482175
11.5%
1 1475322
11.5%
- 1169184
9.1%
: 1169184
9.1%
4 940152
 
7.3%
5 927783
 
7.2%
3 925430
 
7.2%
. 584020
 
4.5%
7 449478
 
3.5%
Other values (3) 1065399
 
8.3%
Latin
ValueCountFrequency (%)
T 584592
50.0%
Z 584592
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14027920
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 2670609
19.0%
0 1482175
10.6%
1 1475322
10.5%
- 1169184
8.3%
: 1169184
8.3%
4 940152
 
6.7%
5 927783
 
6.6%
3 925430
 
6.6%
T 584592
 
4.2%
Z 584592
 
4.2%
Other values (5) 2098897
15.0%

elevation
Text

Missing 

Distinct1379
Distinct (%)1.6%
Missing498000
Missing (%)85.2%
Memory size4.5 MiB
2025-01-08T17:55:14.367328image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length6
Median length6
Mean length5.453991131
Min length3

Characters and Unicode

Total characters472272
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique391 ?
Unique (%)0.5%

Sample

1st row1040.0
2nd row655.0
3rd row1524.0
4th row30.0
5th row220.0
ValueCountFrequency (%)
1829.0 2382
 
2.8%
914.0 2016
 
2.3%
1219.0 1941
 
2.2%
610.0 1879
 
2.2%
1524.0 1853
 
2.1%
1676.0 1775
 
2.0%
2134.0 1668
 
1.9%
305.0 1650
 
1.9%
1067.0 1237
 
1.4%
1372.0 1235
 
1.4%
Other values (1368) 68956
79.6%
2025-01-08T17:55:14.609550image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 121330
25.7%
. 86592
18.3%
1 60345
12.8%
2 39201
 
8.3%
5 31691
 
6.7%
3 26681
 
5.6%
6 22707
 
4.8%
4 22205
 
4.7%
7 21516
 
4.6%
8 20688
 
4.4%
Other values (2) 19316
 
4.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 385678
81.7%
Other Punctuation 86592
 
18.3%
Dash Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 121330
31.5%
1 60345
15.6%
2 39201
 
10.2%
5 31691
 
8.2%
3 26681
 
6.9%
6 22707
 
5.9%
4 22205
 
5.8%
7 21516
 
5.6%
8 20688
 
5.4%
9 19314
 
5.0%
Other Punctuation
ValueCountFrequency (%)
. 86592
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 472272
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 121330
25.7%
. 86592
18.3%
1 60345
12.8%
2 39201
 
8.3%
5 31691
 
6.7%
3 26681
 
5.6%
6 22707
 
4.8%
4 22205
 
4.7%
7 21516
 
4.6%
8 20688
 
4.4%
Other values (2) 19316
 
4.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 472272
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 121330
25.7%
. 86592
18.3%
1 60345
12.8%
2 39201
 
8.3%
5 31691
 
6.7%
3 26681
 
5.6%
6 22707
 
4.8%
4 22205
 
4.7%
7 21516
 
4.6%
8 20688
 
4.4%
Other values (2) 19316
 
4.1%

elevationAccuracy
Text

Missing 

Distinct89
Distinct (%)0.9%
Missing574752
Missing (%)98.3%
Memory size4.5 MiB
2025-01-08T17:55:14.698004image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length5
Median length4
Mean length4.386788618
Min length3

Characters and Unicode

Total characters43166
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)0.2%

Sample

1st row38.0
2nd row76.0
3rd row76.5
4th row106.5
5th row152.0
ValueCountFrequency (%)
152.5 2223
22.6%
76.0 1274
12.9%
76.5 1047
 
10.6%
30.5 536
 
5.4%
45.5 426
 
4.3%
61.0 404
 
4.1%
0.0 394
 
4.0%
106.5 310
 
3.2%
91.5 290
 
2.9%
46.0 265
 
2.7%
Other values (79) 2671
27.1%
2025-01-08T17:55:14.836239image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 9840
22.8%
5 9285
21.5%
0 6214
14.4%
1 4673
10.8%
2 3784
 
8.8%
6 3342
 
7.7%
7 2713
 
6.3%
3 1154
 
2.7%
4 917
 
2.1%
8 711
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 33326
77.2%
Other Punctuation 9840
 
22.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 9285
27.9%
0 6214
18.6%
1 4673
14.0%
2 3784
11.4%
6 3342
 
10.0%
7 2713
 
8.1%
3 1154
 
3.5%
4 917
 
2.8%
8 711
 
2.1%
9 533
 
1.6%
Other Punctuation
ValueCountFrequency (%)
. 9840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 43166
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 9840
22.8%
5 9285
21.5%
0 6214
14.4%
1 4673
10.8%
2 3784
 
8.8%
6 3342
 
7.7%
7 2713
 
6.3%
3 1154
 
2.7%
4 917
 
2.1%
8 711
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 43166
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 9840
22.8%
5 9285
21.5%
0 6214
14.4%
1 4673
10.8%
2 3784
 
8.8%
6 3342
 
7.7%
7 2713
 
6.3%
3 1154
 
2.7%
4 917
 
2.1%
8 711
 
1.6%
Distinct5
Distinct (%)62.5%
Missing584584
Missing (%)> 99.9%
Memory size4.5 MiB
2025-01-08T17:55:14.889239image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length18
Median length17
Mean length17.125
Min length16

Characters and Unicode

Total characters137
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)37.5%

Sample

1st row368.745418614193
2nd row918.1358064728217
3rd row4411.160071289899
4th row4391.045588808231
5th row4411.160071289899
ValueCountFrequency (%)
4411.160071289899 3
37.5%
2413.9981382897595 2
25.0%
368.745418614193 1
 
12.5%
918.1358064728217 1
 
12.5%
4391.045588808231 1
 
12.5%
2025-01-08T17:55:15.001988image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 24
17.5%
8 21
15.3%
9 20
14.6%
4 14
10.2%
2 10
7.3%
0 9
 
6.6%
3 9
 
6.6%
. 8
 
5.8%
7 8
 
5.8%
5 8
 
5.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 129
94.2%
Other Punctuation 8
 
5.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 24
18.6%
8 21
16.3%
9 20
15.5%
4 14
10.9%
2 10
7.8%
0 9
 
7.0%
3 9
 
7.0%
7 8
 
6.2%
5 8
 
6.2%
6 6
 
4.7%
Other Punctuation
ValueCountFrequency (%)
. 8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 137
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 24
17.5%
8 21
15.3%
9 20
14.6%
4 14
10.2%
2 10
7.3%
0 9
 
6.6%
3 9
 
6.6%
. 8
 
5.8%
7 8
 
5.8%
5 8
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 137
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 24
17.5%
8 21
15.3%
9 20
14.6%
4 14
10.2%
2 10
7.3%
0 9
 
6.6%
3 9
 
6.6%
. 8
 
5.8%
7 8
 
5.8%
5 8
 
5.8%

issue
Text

Distinct74
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:15.062165image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length186
Median length48
Mean length53.02144744
Min length48

Characters and Unicode

Total characters30995914
Distinct characters27
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)< 0.1%

Sample

1st rowOCCURRENCE_STATUS_INFERRED_FROM_INDIVIDUAL_COUNT
2nd rowOCCURRENCE_STATUS_INFERRED_FROM_INDIVIDUAL_COUNT
3rd rowOCCURRENCE_STATUS_INFERRED_FROM_INDIVIDUAL_COUNT
4th rowOCCURRENCE_STATUS_INFERRED_FROM_INDIVIDUAL_COUNT
5th rowOCCURRENCE_STATUS_INFERRED_FROM_INDIVIDUAL_COUNT
ValueCountFrequency (%)
occurrence_status_inferred_from_individual_count 488030
83.5%
occurrence_status_inferred_from_individual_count;taxon_match_higherrank 40818
 
7.0%
occurrence_status_inferred_from_individual_count;geodetic_datum_assumed_wgs84 19491
 
3.3%
occurrence_status_inferred_from_individual_count;taxon_match_fuzzy 14228
 
2.4%
occurrence_status_inferred_from_individual_count;continent_derived_from_country;continent_invalid 10628
 
1.8%
occurrence_status_inferred_from_individual_count;geodetic_datum_assumed_wgs84;taxon_match_higherrank 2362
 
0.4%
occurrence_status_inferred_from_individual_count;country_derived_from_coordinates;geodetic_datum_assumed_wgs84;continent_invalid 2250
 
0.4%
occurrence_status_inferred_from_individual_count;geodetic_datum_assumed_wgs84;continent_invalid 1482
 
0.3%
occurrence_status_inferred_from_individual_count;continent_derived_from_country;continent_invalid;taxon_match_higherrank 777
 
0.1%
occurrence_status_inferred_from_individual_count;continent_country_mismatch 752
 
0.1%
Other values (64) 3774
 
0.6%
2025-01-08T17:55:15.191607image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 3192048
10.3%
R 3060927
9.9%
N 2568512
 
8.3%
E 2532321
 
8.2%
I 2495611
 
8.1%
C 2478423
 
8.0%
U 2425637
 
7.8%
T 2012103
 
6.5%
O 1910196
 
6.2%
D 1890173
 
6.1%
Other values (17) 6429963
20.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 27626359
89.1%
Connector Punctuation 3192048
 
10.3%
Other Punctuation 121455
 
0.4%
Decimal Number 56052
 
0.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R 3060927
11.1%
N 2568512
9.3%
E 2532321
9.2%
I 2495611
9.0%
C 2478423
9.0%
U 2425637
8.8%
T 2012103
7.3%
O 1910196
 
6.9%
D 1890173
 
6.8%
A 1412809
 
5.1%
Other values (13) 4839647
17.5%
Decimal Number
ValueCountFrequency (%)
8 28026
50.0%
4 28026
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3192048
100.0%
Other Punctuation
ValueCountFrequency (%)
; 121455
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 27626359
89.1%
Common 3369555
 
10.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
R 3060927
11.1%
N 2568512
9.3%
E 2532321
9.2%
I 2495611
9.0%
C 2478423
9.0%
U 2425637
8.8%
T 2012103
7.3%
O 1910196
 
6.9%
D 1890173
 
6.8%
A 1412809
 
5.1%
Other values (13) 4839647
17.5%
Common
ValueCountFrequency (%)
_ 3192048
94.7%
; 121455
 
3.6%
8 28026
 
0.8%
4 28026
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 30995914
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 3192048
10.3%
R 3060927
9.9%
N 2568512
 
8.3%
E 2532321
 
8.2%
I 2495611
 
8.1%
C 2478423
 
8.0%
U 2425637
 
7.8%
T 2012103
 
6.5%
O 1910196
 
6.2%
D 1890173
 
6.1%
Other values (17) 6429963
20.7%

mediaType
Text

Missing 

Distinct65
Distinct (%)< 0.1%
Missing26095
Missing (%)4.5%
Memory size4.5 MiB
2025-01-08T17:55:15.247279image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length1165
Median length10
Mean length10.99926231
Min length10

Characters and Unicode

Total characters6143055
Distinct characters10
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique23 ?
Unique (%)< 0.1%

Sample

1st rowStillImage
2nd rowStillImage
3rd rowStillImage
4th rowStillImage
5th rowStillImage
ValueCountFrequency (%)
stillimage 544064
97.4%
stillimage;stillimage 6302
 
1.1%
stillimage;stillimage;stillimage;stillimage;stillimage;stillimage 4028
 
0.7%
stillimage;stillimage;stillimage;stillimage 1341
 
0.2%
stillimage;stillimage;stillimage 1085
 
0.2%
stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage 446
 
0.1%
stillimage;stillimage;stillimage;stillimage;stillimage 299
 
0.1%
stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage 160
 
< 0.1%
stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage 119
 
< 0.1%
stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage 99
 
< 0.1%
Other values (55) 554
 
0.1%
2025-01-08T17:55:15.368706image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
l 1218464
19.8%
S 609232
9.9%
t 609232
9.9%
i 609232
9.9%
I 609232
9.9%
m 609232
9.9%
a 609232
9.9%
g 609232
9.9%
e 609232
9.9%
; 50735
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4873856
79.3%
Uppercase Letter 1218464
 
19.8%
Other Punctuation 50735
 
0.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 1218464
25.0%
t 609232
12.5%
i 609232
12.5%
m 609232
12.5%
a 609232
12.5%
g 609232
12.5%
e 609232
12.5%
Uppercase Letter
ValueCountFrequency (%)
S 609232
50.0%
I 609232
50.0%
Other Punctuation
ValueCountFrequency (%)
; 50735
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6092320
99.2%
Common 50735
 
0.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 1218464
20.0%
S 609232
10.0%
t 609232
10.0%
i 609232
10.0%
I 609232
10.0%
m 609232
10.0%
a 609232
10.0%
g 609232
10.0%
e 609232
10.0%
Common
ValueCountFrequency (%)
; 50735
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6143055
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l 1218464
19.8%
S 609232
9.9%
t 609232
9.9%
i 609232
9.9%
I 609232
9.9%
m 609232
9.9%
a 609232
9.9%
g 609232
9.9%
e 609232
9.9%
; 50735
 
0.8%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:15.409663image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length5
Median length5
Mean length4.952058872
Min length4

Characters and Unicode

Total characters2894934
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfalse
2nd rowfalse
3rd rowfalse
4th rowfalse
5th rowfalse
ValueCountFrequency (%)
false 556566
95.2%
true 28026
 
4.8%
2025-01-08T17:55:15.498546image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 584592
20.2%
f 556566
19.2%
a 556566
19.2%
l 556566
19.2%
s 556566
19.2%
t 28026
 
1.0%
r 28026
 
1.0%
u 28026
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2894934
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 584592
20.2%
f 556566
19.2%
a 556566
19.2%
l 556566
19.2%
s 556566
19.2%
t 28026
 
1.0%
r 28026
 
1.0%
u 28026
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2894934
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 584592
20.2%
f 556566
19.2%
a 556566
19.2%
l 556566
19.2%
s 556566
19.2%
t 28026
 
1.0%
r 28026
 
1.0%
u 28026
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2894934
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 584592
20.2%
f 556566
19.2%
a 556566
19.2%
l 556566
19.2%
s 556566
19.2%
t 28026
 
1.0%
r 28026
 
1.0%
u 28026
 
1.0%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:15.540037image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length5
Median length5
Mean length4.999095095
Min length4

Characters and Unicode

Total characters2922431
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfalse
2nd rowfalse
3rd rowfalse
4th rowfalse
5th rowfalse
ValueCountFrequency (%)
false 584063
99.9%
true 529
 
0.1%
2025-01-08T17:55:15.631582image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 584592
20.0%
f 584063
20.0%
a 584063
20.0%
l 584063
20.0%
s 584063
20.0%
t 529
 
< 0.1%
r 529
 
< 0.1%
u 529
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2922431
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 584592
20.0%
f 584063
20.0%
a 584063
20.0%
l 584063
20.0%
s 584063
20.0%
t 529
 
< 0.1%
r 529
 
< 0.1%
u 529
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 2922431
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 584592
20.0%
f 584063
20.0%
a 584063
20.0%
l 584063
20.0%
s 584063
20.0%
t 529
 
< 0.1%
r 529
 
< 0.1%
u 529
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2922431
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 584592
20.0%
f 584063
20.0%
a 584063
20.0%
l 584063
20.0%
s 584063
20.0%
t 529
 
< 0.1%
r 529
 
< 0.1%
u 529
 
< 0.1%
Distinct18875
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:15.820878image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length8
Median length7
Mean length7.002856693
Min length3

Characters and Unicode

Total characters4093814
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2553 ?
Unique (%)0.4%

Sample

1st row2492087
2nd row2480415
3rd row2481705
4th row9367409
5th row5229959
ValueCountFrequency (%)
9409198 2991
 
0.5%
5229252 1915
 
0.3%
9685907 1565
 
0.3%
9791464 1425
 
0.2%
2489985 1281
 
0.2%
5231142 1245
 
0.2%
2473421 1244
 
0.2%
2489670 1187
 
0.2%
7191634 1155
 
0.2%
2489730 1077
 
0.2%
Other values (18865) 569507
97.4%
2025-01-08T17:55:16.077142image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 589019
14.4%
4 482025
11.8%
1 468053
11.4%
7 445367
10.9%
9 442904
10.8%
8 390063
9.5%
6 370970
9.1%
5 314508
7.7%
0 304387
7.4%
3 286518
7.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4093814
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 589019
14.4%
4 482025
11.8%
1 468053
11.4%
7 445367
10.9%
9 442904
10.8%
8 390063
9.5%
6 370970
9.1%
5 314508
7.7%
0 304387
7.4%
3 286518
7.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4093814
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 589019
14.4%
4 482025
11.8%
1 468053
11.4%
7 445367
10.9%
9 442904
10.8%
8 390063
9.5%
6 370970
9.1%
5 314508
7.7%
0 304387
7.4%
3 286518
7.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4093814
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 589019
14.4%
4 482025
11.8%
1 468053
11.4%
7 445367
10.9%
9 442904
10.8%
8 390063
9.5%
6 370970
9.1%
5 314508
7.7%
0 304387
7.4%
3 286518
7.0%
Distinct18485
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:16.277077image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length8
Median length7
Mean length7.009553672
Min length3

Characters and Unicode

Total characters4097729
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2480 ?
Unique (%)0.4%

Sample

1st row2492087
2nd row2480415
3rd row2481705
4th row9367409
5th row5229959
ValueCountFrequency (%)
9409198 2991
 
0.5%
7192429 1918
 
0.3%
7191991 1808
 
0.3%
9685907 1565
 
0.3%
9791464 1425
 
0.2%
7341805 1363
 
0.2%
2489985 1286
 
0.2%
5231142 1245
 
0.2%
2473421 1244
 
0.2%
2489670 1187
 
0.2%
Other values (18475) 568560
97.3%
2025-01-08T17:55:16.534017image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 550899
13.4%
4 487888
11.9%
1 454466
11.1%
9 450768
11.0%
7 438367
10.7%
8 395864
9.7%
6 370734
9.0%
0 331876
8.1%
5 312473
7.6%
3 304394
7.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4097729
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 550899
13.4%
4 487888
11.9%
1 454466
11.1%
9 450768
11.0%
7 438367
10.7%
8 395864
9.7%
6 370734
9.0%
0 331876
8.1%
5 312473
7.6%
3 304394
7.4%

Most occurring scripts

ValueCountFrequency (%)
Common 4097729
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 550899
13.4%
4 487888
11.9%
1 454466
11.1%
9 450768
11.0%
7 438367
10.7%
8 395864
9.7%
6 370734
9.0%
0 331876
8.1%
5 312473
7.6%
3 304394
7.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4097729
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 550899
13.4%
4 487888
11.9%
1 454466
11.1%
9 450768
11.0%
7 438367
10.7%
8 395864
9.7%
6 370734
9.0%
0 331876
8.1%
5 312473
7.6%
3 304394
7.4%

kingdomKey
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:16.587017image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters584592
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1
ValueCountFrequency (%)
1 584592
100.0%
2025-01-08T17:55:16.672568image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 584592
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 584592
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 584592
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 584592
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 584592
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 584592
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 584592
100.0%

phylumKey
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing5
Missing (%)< 0.1%
Memory size4.5 MiB
2025-01-08T17:55:16.709178image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters1169174
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row44
2nd row44
3rd row44
4th row44
5th row44
ValueCountFrequency (%)
44 584587
100.0%
2025-01-08T17:55:16.794076image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4 1169174
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1169174
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4 1169174
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1169174
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
4 1169174
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1169174
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4 1169174
100.0%

classKey
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing5
Missing (%)< 0.1%
Memory size4.5 MiB
2025-01-08T17:55:16.831491image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1753761
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row212
2nd row212
3rd row212
4th row212
5th row212
ValueCountFrequency (%)
212 584587
100.0%
2025-01-08T17:55:16.920490image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 1169174
66.7%
1 584587
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1753761
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 1169174
66.7%
1 584587
33.3%

Most occurring scripts

ValueCountFrequency (%)
Common 1753761
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 1169174
66.7%
1 584587
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1753761
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 1169174
66.7%
1 584587
33.3%
Distinct42
Distinct (%)< 0.1%
Missing20
Missing (%)< 0.1%
Memory size4.5 MiB
2025-01-08T17:55:16.976128image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length8
Median length3
Mean length3.739311838
Min length3

Characters and Unicode

Total characters2185897
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row729
2nd row7191147
3rd row7192402
4th row7192402
5th row729
ValueCountFrequency (%)
729 372474
63.7%
7192402 44387
 
7.6%
724 22599
 
3.9%
1448 18185
 
3.1%
1108 15668
 
2.7%
723 14813
 
2.5%
1446 12800
 
2.2%
7191147 11414
 
2.0%
1447 7822
 
1.3%
1445 7419
 
1.3%
Other values (32) 56991
 
9.7%
2025-01-08T17:55:17.085424image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7 536873
24.6%
2 518543
23.7%
9 478674
21.9%
1 217733
10.0%
4 205606
 
9.4%
0 84766
 
3.9%
5 50334
 
2.3%
8 44059
 
2.0%
3 29305
 
1.3%
6 20004
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2185897
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
7 536873
24.6%
2 518543
23.7%
9 478674
21.9%
1 217733
10.0%
4 205606
 
9.4%
0 84766
 
3.9%
5 50334
 
2.3%
8 44059
 
2.0%
3 29305
 
1.3%
6 20004
 
0.9%

Most occurring scripts

ValueCountFrequency (%)
Common 2185897
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
7 536873
24.6%
2 518543
23.7%
9 478674
21.9%
1 217733
10.0%
4 205606
 
9.4%
0 84766
 
3.9%
5 50334
 
2.3%
8 44059
 
2.0%
3 29305
 
1.3%
6 20004
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2185897
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
7 536873
24.6%
2 518543
23.7%
9 478674
21.9%
1 217733
10.0%
4 205606
 
9.4%
0 84766
 
3.9%
5 50334
 
2.3%
8 44059
 
2.0%
3 29305
 
1.3%
6 20004
 
0.9%
Distinct239
Distinct (%)< 0.1%
Missing17
Missing (%)< 0.1%
Memory size4.5 MiB
2025-01-08T17:55:17.258416image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length7
Median length4
Mean length4.376314416
Min length4

Characters and Unicode

Total characters2558284
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row9352
2nd row2877
3rd row5282
4th row9316
5th row4287160
ValueCountFrequency (%)
9410667 39435
 
6.7%
5263 34481
 
5.9%
5291 26165
 
4.5%
6176 19964
 
3.4%
9352 18114
 
3.1%
9333 17391
 
3.0%
5242 17014
 
2.9%
5282 16651
 
2.8%
5290 16039
 
2.7%
2986 15579
 
2.7%
Other values (229) 363742
62.2%
2025-01-08T17:55:17.491665image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 436950
17.1%
5 376699
14.7%
9 375236
14.7%
3 343158
13.4%
6 246705
9.6%
1 193745
7.6%
8 150651
 
5.9%
0 147912
 
5.8%
7 147562
 
5.8%
4 139666
 
5.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2558284
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 436950
17.1%
5 376699
14.7%
9 375236
14.7%
3 343158
13.4%
6 246705
9.6%
1 193745
7.6%
8 150651
 
5.9%
0 147912
 
5.8%
7 147562
 
5.8%
4 139666
 
5.5%

Most occurring scripts

ValueCountFrequency (%)
Common 2558284
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 436950
17.1%
5 376699
14.7%
9 375236
14.7%
3 343158
13.4%
6 246705
9.6%
1 193745
7.6%
8 150651
 
5.9%
0 147912
 
5.8%
7 147562
 
5.8%
4 139666
 
5.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2558284
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 436950
17.1%
5 376699
14.7%
9 375236
14.7%
3 343158
13.4%
6 246705
9.6%
1 193745
7.6%
8 150651
 
5.9%
0 147912
 
5.8%
7 147562
 
5.8%
4 139666
 
5.5%
Distinct2196
Distinct (%)0.4%
Missing338
Missing (%)0.1%
Memory size4.5 MiB
2025-01-08T17:55:17.694282image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length8
Median length7
Mean length7.005993968
Min length7

Characters and Unicode

Total characters4093280
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique84 ?
Unique (%)< 0.1%

Sample

1st row2492080
2nd row2480414
3rd row2481704
4th row2481227
5th row2484660
ValueCountFrequency (%)
2489984 18301
 
3.1%
2492191 7103
 
1.2%
2490714 6838
 
1.2%
2481739 6684
 
1.1%
2487406 6403
 
1.1%
2484444 5379
 
0.9%
2490799 4885
 
0.8%
2492009 4780
 
0.8%
2489637 4423
 
0.8%
6173226 4075
 
0.7%
Other values (2186) 515383
88.2%
2025-01-08T17:55:17.940352image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4 811117
19.8%
2 756156
18.5%
8 500110
12.2%
9 498768
12.2%
7 305303
 
7.5%
1 276534
 
6.8%
3 259496
 
6.3%
6 243474
 
5.9%
0 240481
 
5.9%
5 201841
 
4.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4093280
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4 811117
19.8%
2 756156
18.5%
8 500110
12.2%
9 498768
12.2%
7 305303
 
7.5%
1 276534
 
6.8%
3 259496
 
6.3%
6 243474
 
5.9%
0 240481
 
5.9%
5 201841
 
4.9%

Most occurring scripts

ValueCountFrequency (%)
Common 4093280
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
4 811117
19.8%
2 756156
18.5%
8 500110
12.2%
9 498768
12.2%
7 305303
 
7.5%
1 276534
 
6.8%
3 259496
 
6.3%
6 243474
 
5.9%
0 240481
 
5.9%
5 201841
 
4.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4093280
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4 811117
19.8%
2 756156
18.5%
8 500110
12.2%
9 498768
12.2%
7 305303
 
7.5%
1 276534
 
6.8%
3 259496
 
6.3%
6 243474
 
5.9%
0 240481
 
5.9%
5 201841
 
4.9%

speciesKey
Text

Missing 

Distinct8234
Distinct (%)1.4%
Missing7853
Missing (%)1.3%
Memory size4.5 MiB
2025-01-08T17:55:18.140139image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length8
Median length7
Mean length7.008723183
Min length7

Characters and Unicode

Total characters4042204
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique681 ?
Unique (%)0.1%

Sample

1st row2492087
2nd row2480415
3rd row2481705
4th row9367409
5th row5229959
ValueCountFrequency (%)
2492196 5111
 
0.9%
9409198 4956
 
0.9%
9362842 4264
 
0.7%
5231142 3641
 
0.6%
9415596 3345
 
0.6%
2489670 2973
 
0.5%
9510564 2633
 
0.5%
5789284 1921
 
0.3%
5231132 1886
 
0.3%
2478259 1885
 
0.3%
Other values (8224) 544124
94.3%
2025-01-08T17:55:18.388437image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 724124
17.9%
4 637906
15.8%
9 466985
11.6%
8 437618
10.8%
5 324481
8.0%
1 308952
7.6%
3 303698
7.5%
7 294100
7.3%
0 292502
7.2%
6 251838
 
6.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4042204
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 724124
17.9%
4 637906
15.8%
9 466985
11.6%
8 437618
10.8%
5 324481
8.0%
1 308952
7.6%
3 303698
7.5%
7 294100
7.3%
0 292502
7.2%
6 251838
 
6.2%

Most occurring scripts

ValueCountFrequency (%)
Common 4042204
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 724124
17.9%
4 637906
15.8%
9 466985
11.6%
8 437618
10.8%
5 324481
8.0%
1 308952
7.6%
3 303698
7.5%
7 294100
7.3%
0 292502
7.2%
6 251838
 
6.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4042204
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 724124
17.9%
4 637906
15.8%
9 466985
11.6%
8 437618
10.8%
5 324481
8.0%
1 308952
7.6%
3 303698
7.5%
7 294100
7.3%
0 292502
7.2%
6 251838
 
6.2%

species
Text

Missing 

Distinct8234
Distinct (%)1.4%
Missing7853
Missing (%)1.3%
Memory size4.5 MiB
2025-01-08T17:55:18.578961image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length36
Median length29
Mean length18.43415132
Min length9

Characters and Unicode

Total characters10631694
Distinct characters53
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique681 ?
Unique (%)0.1%

Sample

1st rowParoaria capitata
2nd rowRostrhamus sociabilis
3rd rowBartramia longicauda
4th rowSterna hirundo
5th rowPrionochilus plateni
ValueCountFrequency (%)
setophaga 18263
 
1.6%
melospiza 7103
 
0.6%
turdus 6787
 
0.6%
calidris 6682
 
0.6%
vireo 6370
 
0.6%
agelaius 5367
 
0.5%
melodia 5111
 
0.4%
phoeniceus 4986
 
0.4%
catharus 4865
 
0.4%
hyemalis 4856
 
0.4%
Other values (6762) 1083406
93.9%
2025-01-08T17:55:18.823125image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 1140380
 
10.7%
i 948191
 
8.9%
s 897805
 
8.4%
r 687139
 
6.5%
o 682086
 
6.4%
e 675341
 
6.4%
u 655256
 
6.2%
l 598920
 
5.6%
577057
 
5.4%
n 529238
 
5.0%
Other values (43) 3240281
30.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 9477896
89.1%
Space Separator 577057
 
5.4%
Uppercase Letter 576741
 
5.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1140380
12.0%
i 948191
10.0%
s 897805
9.5%
r 687139
 
7.2%
o 682086
 
7.2%
e 675341
 
7.1%
u 655256
 
6.9%
l 598920
 
6.3%
n 529238
 
5.6%
c 507575
 
5.4%
Other values (16) 2155965
22.7%
Uppercase Letter
ValueCountFrequency (%)
C 87799
15.2%
P 83433
14.5%
S 65465
11.4%
A 54586
9.5%
M 47035
8.2%
T 41684
 
7.2%
L 29823
 
5.2%
E 23359
 
4.1%
G 17319
 
3.0%
H 17005
 
2.9%
Other values (16) 109233
18.9%
Space Separator
ValueCountFrequency (%)
577057
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 10054637
94.6%
Common 577057
 
5.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1140380
11.3%
i 948191
 
9.4%
s 897805
 
8.9%
r 687139
 
6.8%
o 682086
 
6.8%
e 675341
 
6.7%
u 655256
 
6.5%
l 598920
 
6.0%
n 529238
 
5.3%
c 507575
 
5.0%
Other values (42) 2732706
27.2%
Common
ValueCountFrequency (%)
577057
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10631694
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 1140380
 
10.7%
i 948191
 
8.9%
s 897805
 
8.4%
r 687139
 
6.5%
o 682086
 
6.4%
e 675341
 
6.4%
u 655256
 
6.2%
l 598920
 
5.6%
577057
 
5.4%
n 529238
 
5.0%
Other values (43) 3240281
30.5%
Distinct18485
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:18.994376image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length101
Median length69
Mean length36.3684775
Min length4

Characters and Unicode

Total characters21260721
Distinct characters78
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2480 ?
Unique (%)0.4%

Sample

1st rowParoaria capitata (d'Orbigny & Lafresnaye, 1837)
2nd rowRostrhamus sociabilis (Vieillot, 1817)
3rd rowBartramia longicauda (Bechstein, 1812)
4th rowSterna hirundo Linnaeus, 1758
5th rowPrionochilus plateni W.Blasius, 1888
ValueCountFrequency (%)
linnaeus 96059
 
3.9%
1758 62975
 
2.6%
1766 31923
 
1.3%
1789 22527
 
0.9%
21216
 
0.9%
vieillot 20464
 
0.8%
setophaga 18301
 
0.8%
j.f.gmelin 17514
 
0.7%
ridgway 15118
 
0.6%
gmelin 12289
 
0.5%
Other values (11309) 2119945
86.9%
2025-01-08T17:55:19.228359image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1853739
 
8.7%
a 1757649
 
8.3%
i 1549698
 
7.3%
s 1388187
 
6.5%
e 1256072
 
5.9%
n 1103662
 
5.2%
r 1034765
 
4.9%
o 978110
 
4.6%
u 968585
 
4.6%
l 964073
 
4.5%
Other values (68) 8406181
39.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 15002770
70.6%
Decimal Number 1908932
 
9.0%
Space Separator 1853739
 
8.7%
Uppercase Letter 1233477
 
5.8%
Other Punctuation 637605
 
3.0%
Close Punctuation 310825
 
1.5%
Open Punctuation 310825
 
1.5%
Dash Punctuation 2548
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1757649
11.7%
i 1549698
10.3%
s 1388187
9.3%
e 1256072
 
8.4%
n 1103662
 
7.4%
r 1034765
 
6.9%
o 978110
 
6.5%
u 968585
 
6.5%
l 964073
 
6.4%
t 729354
 
4.9%
Other values (23) 3272615
21.8%
Uppercase Letter
ValueCountFrequency (%)
L 167639
13.6%
S 133493
10.8%
P 115415
 
9.4%
C 112848
 
9.1%
G 76000
 
6.2%
A 75543
 
6.1%
B 72757
 
5.9%
M 66309
 
5.4%
T 62888
 
5.1%
R 49146
 
4.0%
Other values (17) 301439
24.4%
Decimal Number
ValueCountFrequency (%)
1 562050
29.4%
8 422555
22.1%
7 235719
12.3%
9 147314
 
7.7%
6 128558
 
6.7%
5 121276
 
6.4%
3 81907
 
4.3%
2 75569
 
4.0%
0 67710
 
3.5%
4 66274
 
3.5%
Other Punctuation
ValueCountFrequency (%)
, 477346
74.9%
. 138018
 
21.6%
& 21216
 
3.3%
' 1025
 
0.2%
Space Separator
ValueCountFrequency (%)
1853739
100.0%
Close Punctuation
ValueCountFrequency (%)
) 310825
100.0%
Open Punctuation
ValueCountFrequency (%)
( 310825
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2548
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 16236247
76.4%
Common 5024474
 
23.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1757649
 
10.8%
i 1549698
 
9.5%
s 1388187
 
8.5%
e 1256072
 
7.7%
n 1103662
 
6.8%
r 1034765
 
6.4%
o 978110
 
6.0%
u 968585
 
6.0%
l 964073
 
5.9%
t 729354
 
4.5%
Other values (50) 4506092
27.8%
Common
ValueCountFrequency (%)
1853739
36.9%
1 562050
 
11.2%
, 477346
 
9.5%
8 422555
 
8.4%
) 310825
 
6.2%
( 310825
 
6.2%
7 235719
 
4.7%
9 147314
 
2.9%
. 138018
 
2.7%
6 128558
 
2.6%
Other values (8) 437525
 
8.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21254630
> 99.9%
None 6091
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1853739
 
8.7%
a 1757649
 
8.3%
i 1549698
 
7.3%
s 1388187
 
6.5%
e 1256072
 
5.9%
n 1103662
 
5.2%
r 1034765
 
4.9%
o 978110
 
4.6%
u 968585
 
4.6%
l 964073
 
4.5%
Other values (60) 8400090
39.5%
None
ValueCountFrequency (%)
ü 4413
72.5%
é 890
 
14.6%
á 359
 
5.9%
è 250
 
4.1%
ä 90
 
1.5%
ö 60
 
1.0%
É 21
 
0.3%
ø 8
 
0.1%
Distinct22061
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:19.415609image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length65
Median length50
Mean length23.69967259
Min length7

Characters and Unicode

Total characters13854639
Distinct characters58
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3436 ?
Unique (%)0.6%

Sample

1st rowParoaria capitata
2nd rowRostrhamus sociabilis
3rd rowBartramia longicauda
4th rowSterna hirundo
5th rowPrionochilus plateni
ValueCountFrequency (%)
dendroica 14826
 
1.0%
parus 7485
 
0.5%
melospiza 7103
 
0.5%
turdus 6813
 
0.5%
vireo 6404
 
0.4%
calidris 6376
 
0.4%
sterna 6184
 
0.4%
hyemalis 5963
 
0.4%
melodia 5927
 
0.4%
carduelis 5742
 
0.4%
Other values (10903) 1419872
95.1%
2025-01-08T17:55:19.679353image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 1477651
 
10.7%
i 1303096
 
9.4%
s 1190344
 
8.6%
r 934012
 
6.7%
908103
 
6.6%
e 885911
 
6.4%
u 853994
 
6.2%
o 821323
 
5.9%
l 776498
 
5.6%
n 730705
 
5.3%
Other values (48) 3973002
28.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 12360285
89.2%
Space Separator 908103
 
6.6%
Uppercase Letter 584699
 
4.2%
Other Punctuation 1511
 
< 0.1%
Dash Punctuation 41
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1477651
12.0%
i 1303096
10.5%
s 1190344
9.6%
r 934012
 
7.6%
e 885911
 
7.2%
u 853994
 
6.9%
o 821323
 
6.6%
l 776498
 
6.3%
n 730705
 
5.9%
c 671384
 
5.4%
Other values (16) 2715367
22.0%
Uppercase Letter
ValueCountFrequency (%)
C 92584
15.8%
P 87973
15.0%
A 57125
9.8%
S 48743
8.3%
M 44873
 
7.7%
T 42452
 
7.3%
D 28042
 
4.8%
L 25741
 
4.4%
E 22719
 
3.9%
G 16764
 
2.9%
Other values (16) 117683
20.1%
Other Punctuation
ValueCountFrequency (%)
. 1121
74.2%
" 348
 
23.0%
/ 37
 
2.4%
? 5
 
0.3%
Space Separator
ValueCountFrequency (%)
908103
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 41
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12944984
93.4%
Common 909655
 
6.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1477651
11.4%
i 1303096
 
10.1%
s 1190344
 
9.2%
r 934012
 
7.2%
e 885911
 
6.8%
u 853994
 
6.6%
o 821323
 
6.3%
l 776498
 
6.0%
n 730705
 
5.6%
c 671384
 
5.2%
Other values (42) 3300066
25.5%
Common
ValueCountFrequency (%)
908103
99.8%
. 1121
 
0.1%
" 348
 
< 0.1%
- 41
 
< 0.1%
/ 37
 
< 0.1%
? 5
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13854639
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 1477651
 
10.7%
i 1303096
 
9.4%
s 1190344
 
8.6%
r 934012
 
6.7%
908103
 
6.6%
e 885911
 
6.4%
u 853994
 
6.2%
o 821323
 
5.9%
l 776498
 
5.6%
n 730705
 
5.3%
Other values (48) 3973002
28.7%

protocol
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:19.732768image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1753776
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEML
2nd rowEML
3rd rowEML
4th rowEML
5th rowEML
ValueCountFrequency (%)
eml 584592
100.0%
2025-01-08T17:55:19.819767image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E 584592
33.3%
M 584592
33.3%
L 584592
33.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1753776
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 584592
33.3%
M 584592
33.3%
L 584592
33.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 1753776
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 584592
33.3%
M 584592
33.3%
L 584592
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1753776
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 584592
33.3%
M 584592
33.3%
L 584592
33.3%
Distinct183965
Distinct (%)31.5%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:19.953459image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length24
Median length24
Mean length23.99608616
Min length20

Characters and Unicode

Total characters14027920
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique40346 ?
Unique (%)6.9%

Sample

1st row2024-12-02T13:56:05.137Z
2nd row2024-12-02T13:56:08.067Z
3rd row2024-12-02T13:59:48.585Z
4th row2024-12-02T13:56:09.311Z
5th row2024-12-02T13:58:24.805Z
ValueCountFrequency (%)
2024-12-02t13:57:59.341z 17
 
< 0.1%
2024-12-02t13:57:45.007z 16
 
< 0.1%
2024-12-02t13:57:38.028z 16
 
< 0.1%
2024-12-02t13:57:53.841z 16
 
< 0.1%
2024-12-02t13:57:44.964z 15
 
< 0.1%
2024-12-02t13:58:02.321z 15
 
< 0.1%
2024-12-02t13:57:53.332z 15
 
< 0.1%
2024-12-02t13:57:51.208z 15
 
< 0.1%
2024-12-02t13:58:02.659z 15
 
< 0.1%
2024-12-02t13:57:41.116z 15
 
< 0.1%
Other values (183955) 584437
> 99.9%
2025-01-08T17:55:20.157176image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 2670609
19.0%
0 1482175
10.6%
1 1475322
10.5%
- 1169184
8.3%
: 1169184
8.3%
4 940152
 
6.7%
5 927783
 
6.6%
3 925430
 
6.6%
T 584592
 
4.2%
Z 584592
 
4.2%
Other values (5) 2098897
15.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 9936348
70.8%
Other Punctuation 1753204
 
12.5%
Dash Punctuation 1169184
 
8.3%
Uppercase Letter 1169184
 
8.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 2670609
26.9%
0 1482175
14.9%
1 1475322
14.8%
4 940152
 
9.5%
5 927783
 
9.3%
3 925430
 
9.3%
7 449478
 
4.5%
9 373966
 
3.8%
6 351326
 
3.5%
8 340107
 
3.4%
Other Punctuation
ValueCountFrequency (%)
: 1169184
66.7%
. 584020
33.3%
Uppercase Letter
ValueCountFrequency (%)
T 584592
50.0%
Z 584592
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 1169184
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 12858736
91.7%
Latin 1169184
 
8.3%

Most frequent character per script

Common
ValueCountFrequency (%)
2 2670609
20.8%
0 1482175
11.5%
1 1475322
11.5%
- 1169184
9.1%
: 1169184
9.1%
4 940152
 
7.3%
5 927783
 
7.2%
3 925430
 
7.2%
. 584020
 
4.5%
7 449478
 
3.5%
Other values (3) 1065399
 
8.3%
Latin
ValueCountFrequency (%)
T 584592
50.0%
Z 584592
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14027920
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 2670609
19.0%
0 1482175
10.6%
1 1475322
10.5%
- 1169184
8.3%
: 1169184
8.3%
4 940152
 
6.7%
5 927783
 
6.6%
3 925430
 
6.6%
T 584592
 
4.2%
Z 584592
 
4.2%
Other values (5) 2098897
15.0%

lastCrawled
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:20.213399image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length24
Median length24
Mean length24
Min length24

Characters and Unicode

Total characters14030208
Distinct characters12
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2024-12-02T11:48:23.416Z
2nd row2024-12-02T11:48:23.416Z
3rd row2024-12-02T11:48:23.416Z
4th row2024-12-02T11:48:23.416Z
5th row2024-12-02T11:48:23.416Z
ValueCountFrequency (%)
2024-12-02t11:48:23.416z 584592
100.0%
2025-01-08T17:55:20.309015image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 2922960
20.8%
1 2338368
16.7%
4 1753776
12.5%
0 1169184
 
8.3%
- 1169184
 
8.3%
: 1169184
 
8.3%
T 584592
 
4.2%
8 584592
 
4.2%
3 584592
 
4.2%
. 584592
 
4.2%
Other values (2) 1169184
 
8.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 9938064
70.8%
Other Punctuation 1753776
 
12.5%
Dash Punctuation 1169184
 
8.3%
Uppercase Letter 1169184
 
8.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 2922960
29.4%
1 2338368
23.5%
4 1753776
17.6%
0 1169184
 
11.8%
8 584592
 
5.9%
3 584592
 
5.9%
6 584592
 
5.9%
Other Punctuation
ValueCountFrequency (%)
: 1169184
66.7%
. 584592
33.3%
Uppercase Letter
ValueCountFrequency (%)
T 584592
50.0%
Z 584592
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 1169184
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 12861024
91.7%
Latin 1169184
 
8.3%

Most frequent character per script

Common
ValueCountFrequency (%)
2 2922960
22.7%
1 2338368
18.2%
4 1753776
13.6%
0 1169184
 
9.1%
- 1169184
 
9.1%
: 1169184
 
9.1%
8 584592
 
4.5%
3 584592
 
4.5%
. 584592
 
4.5%
6 584592
 
4.5%
Latin
ValueCountFrequency (%)
T 584592
50.0%
Z 584592
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14030208
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 2922960
20.8%
1 2338368
16.7%
4 1753776
12.5%
0 1169184
 
8.3%
- 1169184
 
8.3%
: 1169184
 
8.3%
T 584592
 
4.2%
8 584592
 
4.2%
3 584592
 
4.2%
. 584592
 
4.2%
Other values (2) 1169184
 
8.3%
Distinct2
Distinct (%)< 0.1%
Missing3194
Missing (%)0.5%
Memory size4.5 MiB
2025-01-08T17:55:20.468593image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length5
Median length4
Mean length4.372956219
Min length4

Characters and Unicode

Total characters2542428
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowtrue
2nd rowfalse
3rd rowfalse
4th rowfalse
5th rowtrue
ValueCountFrequency (%)
true 364562
62.7%
false 216836
37.3%
2025-01-08T17:55:20.561895image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 581398
22.9%
t 364562
14.3%
r 364562
14.3%
u 364562
14.3%
f 216836
 
8.5%
a 216836
 
8.5%
l 216836
 
8.5%
s 216836
 
8.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2542428
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 581398
22.9%
t 364562
14.3%
r 364562
14.3%
u 364562
14.3%
f 216836
 
8.5%
a 216836
 
8.5%
l 216836
 
8.5%
s 216836
 
8.5%

Most occurring scripts

ValueCountFrequency (%)
Latin 2542428
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 581398
22.9%
t 364562
14.3%
r 364562
14.3%
u 364562
14.3%
f 216836
 
8.5%
a 216836
 
8.5%
l 216836
 
8.5%
s 216836
 
8.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2542428
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 581398
22.9%
t 364562
14.3%
r 364562
14.3%
u 364562
14.3%
f 216836
 
8.5%
a 216836
 
8.5%
l 216836
 
8.5%
s 216836
 
8.5%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:20.601894image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length5
Median length5
Mean length4.992324561
Min length4

Characters and Unicode

Total characters2918473
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfalse
2nd rowfalse
3rd rowfalse
4th rowfalse
5th rowfalse
ValueCountFrequency (%)
false 580105
99.2%
true 4487
 
0.8%
2025-01-08T17:55:20.692360image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 584592
20.0%
f 580105
19.9%
a 580105
19.9%
l 580105
19.9%
s 580105
19.9%
t 4487
 
0.2%
r 4487
 
0.2%
u 4487
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2918473
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 584592
20.0%
f 580105
19.9%
a 580105
19.9%
l 580105
19.9%
s 580105
19.9%
t 4487
 
0.2%
r 4487
 
0.2%
u 4487
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 2918473
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 584592
20.0%
f 580105
19.9%
a 580105
19.9%
l 580105
19.9%
s 580105
19.9%
t 4487
 
0.2%
r 4487
 
0.2%
u 4487
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2918473
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 584592
20.0%
f 580105
19.9%
a 580105
19.9%
l 580105
19.9%
s 580105
19.9%
t 4487
 
0.2%
r 4487
 
0.2%
u 4487
 
0.2%

gbifRegion
Text

Missing 

Distinct7
Distinct (%)< 0.1%
Missing19462
Missing (%)3.3%
Memory size4.5 MiB
2025-01-08T17:55:20.740976image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length13
Median length13
Mean length10.62288677
Min length4

Characters and Unicode

Total characters6003312
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLATIN_AMERICA
2nd rowNORTH_AMERICA
3rd rowNORTH_AMERICA
4th rowNORTH_AMERICA
5th rowASIA
ValueCountFrequency (%)
north_america 235097
41.6%
latin_america 161402
28.6%
asia 91675
 
16.2%
africa 47164
 
8.3%
oceania 14385
 
2.5%
europe 13906
 
2.5%
antarctica 1501
 
0.3%
2025-01-08T17:55:20.836384image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
A 1265351
21.1%
I 712626
11.9%
R 694167
11.6%
C 461050
 
7.7%
E 438696
 
7.3%
N 412385
 
6.9%
T 399501
 
6.7%
_ 396499
 
6.6%
M 396499
 
6.6%
O 263388
 
4.4%
Other values (6) 563150
9.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 5606813
93.4%
Connector Punctuation 396499
 
6.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 1265351
22.6%
I 712626
12.7%
R 694167
12.4%
C 461050
 
8.2%
E 438696
 
7.8%
N 412385
 
7.4%
T 399501
 
7.1%
M 396499
 
7.1%
O 263388
 
4.7%
H 235097
 
4.2%
Other values (5) 328053
 
5.9%
Connector Punctuation
ValueCountFrequency (%)
_ 396499
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5606813
93.4%
Common 396499
 
6.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 1265351
22.6%
I 712626
12.7%
R 694167
12.4%
C 461050
 
8.2%
E 438696
 
7.8%
N 412385
 
7.4%
T 399501
 
7.1%
M 396499
 
7.1%
O 263388
 
4.7%
H 235097
 
4.2%
Other values (5) 328053
 
5.9%
Common
ValueCountFrequency (%)
_ 396499
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6003312
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 1265351
21.1%
I 712626
11.9%
R 694167
11.6%
C 461050
 
7.7%
E 438696
 
7.3%
N 412385
 
6.9%
T 399501
 
6.7%
_ 396499
 
6.6%
M 396499
 
6.6%
O 263388
 
4.4%
Other values (6) 563150
9.4%

publishedByGbifRegion
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-01-08T17:55:20.880384image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length13
Median length13
Mean length13
Min length13

Characters and Unicode

Total characters7599696
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNORTH_AMERICA
2nd rowNORTH_AMERICA
3rd rowNORTH_AMERICA
4th rowNORTH_AMERICA
5th rowNORTH_AMERICA
ValueCountFrequency (%)
north_america 584592
100.0%
2025-01-08T17:55:20.974963image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
R 1169184
15.4%
A 1169184
15.4%
N 584592
7.7%
O 584592
7.7%
T 584592
7.7%
H 584592
7.7%
_ 584592
7.7%
M 584592
7.7%
E 584592
7.7%
I 584592
7.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 7015104
92.3%
Connector Punctuation 584592
 
7.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R 1169184
16.7%
A 1169184
16.7%
N 584592
8.3%
O 584592
8.3%
T 584592
8.3%
H 584592
8.3%
M 584592
8.3%
E 584592
8.3%
I 584592
8.3%
C 584592
8.3%
Connector Punctuation
ValueCountFrequency (%)
_ 584592
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 7015104
92.3%
Common 584592
 
7.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
R 1169184
16.7%
A 1169184
16.7%
N 584592
8.3%
O 584592
8.3%
T 584592
8.3%
H 584592
8.3%
M 584592
8.3%
E 584592
8.3%
I 584592
8.3%
C 584592
8.3%
Common
ValueCountFrequency (%)
_ 584592
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7599696
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
R 1169184
15.4%
A 1169184
15.4%
N 584592
7.7%
O 584592
7.7%
T 584592
7.7%
H 584592
7.7%
_ 584592
7.7%
M 584592
7.7%
E 584592
7.7%
I 584592
7.7%

level0Gid
Text

Missing 

Distinct105
Distinct (%)0.5%
Missing562100
Missing (%)96.2%
Memory size4.5 MiB
2025-01-08T17:55:21.062176image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters67476
Distinct characters28
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15 ?
Unique (%)0.1%

Sample

1st rowUSA
2nd rowMYS
3rd rowCOL
4th rowCOL
5th rowIND
ValueCountFrequency (%)
usa 3144
14.0%
eth 2614
 
11.6%
col 2574
 
11.4%
tza 1839
 
8.2%
afg 1717
 
7.6%
rus 826
 
3.7%
per 772
 
3.4%
guy 696
 
3.1%
bra 631
 
2.8%
ven 607
 
2.7%
Other values (95) 7072
31.4%
2025-01-08T17:55:21.198637image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
A 9076
13.5%
T 5591
 
8.3%
U 5200
 
7.7%
S 4996
 
7.4%
E 4583
 
6.8%
R 4031
 
6.0%
H 3631
 
5.4%
L 3528
 
5.2%
C 3243
 
4.8%
G 3139
 
4.7%
Other values (18) 20458
30.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 67474
> 99.9%
Decimal Number 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 9076
13.5%
T 5591
 
8.3%
U 5200
 
7.7%
S 4996
 
7.4%
E 4583
 
6.8%
R 4031
 
6.0%
H 3631
 
5.4%
L 3528
 
5.2%
C 3243
 
4.8%
G 3139
 
4.7%
Other values (16) 20456
30.3%
Decimal Number
ValueCountFrequency (%)
0 1
50.0%
2 1
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 67474
> 99.9%
Common 2
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 9076
13.5%
T 5591
 
8.3%
U 5200
 
7.7%
S 4996
 
7.4%
E 4583
 
6.8%
R 4031
 
6.0%
H 3631
 
5.4%
L 3528
 
5.2%
C 3243
 
4.8%
G 3139
 
4.7%
Other values (16) 20456
30.3%
Common
ValueCountFrequency (%)
0 1
50.0%
2 1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 67476
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 9076
13.5%
T 5591
 
8.3%
U 5200
 
7.7%
S 4996
 
7.4%
E 4583
 
6.8%
R 4031
 
6.0%
H 3631
 
5.4%
L 3528
 
5.2%
C 3243
 
4.8%
G 3139
 
4.7%
Other values (18) 20458
30.3%

level0Name
Text

Missing 

Distinct105
Distinct (%)0.5%
Missing562100
Missing (%)96.2%
Memory size4.5 MiB
2025-01-08T17:55:21.313093image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length32
Median length30
Mean length8.545304997
Min length4

Characters and Unicode

Total characters192201
Distinct characters51
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15 ?
Unique (%)0.1%

Sample

1st rowUnited States
2nd rowMalaysia
3rd rowColombia
4th rowColombia
5th rowIndia
ValueCountFrequency (%)
united 3159
 
11.8%
states 3150
 
11.8%
ethiopia 2614
 
9.8%
colombia 2574
 
9.7%
tanzania 1839
 
6.9%
afghanistan 1717
 
6.4%
russia 826
 
3.1%
peru 772
 
2.9%
guyana 696
 
2.6%
brazil 631
 
2.4%
Other values (124) 8681
32.6%
2025-01-08T17:55:21.481513image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 30627
15.9%
i 21547
 
11.2%
n 15591
 
8.1%
t 15171
 
7.9%
e 11832
 
6.2%
o 9996
 
5.2%
s 7991
 
4.2%
l 5975
 
3.1%
h 5409
 
2.8%
d 5050
 
2.6%
Other values (41) 63012
32.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 161462
84.0%
Uppercase Letter 26571
 
13.8%
Space Separator 4167
 
2.2%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 30627
19.0%
i 21547
13.3%
n 15591
9.7%
t 15171
9.4%
e 11832
 
7.3%
o 9996
 
6.2%
s 7991
 
4.9%
l 5975
 
3.7%
h 5409
 
3.4%
d 5050
 
3.1%
Other values (17) 32273
20.0%
Uppercase Letter
ValueCountFrequency (%)
S 3783
14.2%
U 3541
13.3%
C 2941
11.1%
E 2916
11.0%
T 2351
8.8%
A 1931
7.3%
P 1892
7.1%
G 1242
 
4.7%
M 1221
 
4.6%
I 1009
 
3.8%
Other values (12) 3744
14.1%
Space Separator
ValueCountFrequency (%)
4167
100.0%
Other Punctuation
ValueCountFrequency (%)
, 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 188033
97.8%
Common 4168
 
2.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 30627
16.3%
i 21547
 
11.5%
n 15591
 
8.3%
t 15171
 
8.1%
e 11832
 
6.3%
o 9996
 
5.3%
s 7991
 
4.2%
l 5975
 
3.2%
h 5409
 
2.9%
d 5050
 
2.7%
Other values (39) 58844
31.3%
Common
ValueCountFrequency (%)
4167
> 99.9%
, 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 191824
99.8%
None 377
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 30627
16.0%
i 21547
 
11.2%
n 15591
 
8.1%
t 15171
 
7.9%
e 11832
 
6.2%
o 9996
 
5.2%
s 7991
 
4.2%
l 5975
 
3.1%
h 5409
 
2.8%
d 5050
 
2.6%
Other values (40) 62635
32.7%
None
ValueCountFrequency (%)
é 377
100.0%

level1Gid
Text

Missing 

Distinct474
Distinct (%)2.1%
Missing562129
Missing (%)96.2%
Memory size4.5 MiB
2025-01-08T17:55:21.677397image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length8
Median length8
Mean length7.554333793
Min length6

Characters and Unicode

Total characters169693
Distinct characters38
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique114 ?
Unique (%)0.5%

Sample

1st rowUSA.49_1
2nd rowMYS.13_1
3rd rowCOL.6_2
4th rowCOL.4_2
5th rowIND.2_1
ValueCountFrequency (%)
eth.8_1 1052
 
4.7%
afg.28_1 995
 
4.4%
usa.2_1 907
 
4.0%
afg.15_1 663
 
3.0%
tza.14_1 573
 
2.6%
eth.4_1 547
 
2.4%
bra.14_1 486
 
2.2%
eth.6_1 475
 
2.1%
kwt.3_1 473
 
2.1%
per.8_1 473
 
2.1%
Other values (464) 15819
70.4%
2025-01-08T17:55:21.930334image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 28607
16.9%
_ 22463
13.2%
. 22443
13.2%
2 9571
 
5.6%
A 9043
 
5.3%
T 5565
 
3.3%
U 5200
 
3.1%
S 4996
 
2.9%
E 4583
 
2.7%
4 4144
 
2.4%
Other values (28) 53078
31.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 67387
39.7%
Decimal Number 57400
33.8%
Connector Punctuation 22463
 
13.2%
Other Punctuation 22443
 
13.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 9043
13.4%
T 5565
 
8.3%
U 5200
 
7.7%
S 4996
 
7.4%
E 4583
 
6.8%
R 4029
 
6.0%
H 3631
 
5.4%
L 3528
 
5.2%
C 3242
 
4.8%
G 3139
 
4.7%
Other values (16) 20431
30.3%
Decimal Number
ValueCountFrequency (%)
1 28607
49.8%
2 9571
 
16.7%
4 4144
 
7.2%
8 4002
 
7.0%
3 3062
 
5.3%
5 2456
 
4.3%
0 1764
 
3.1%
6 1648
 
2.9%
9 1238
 
2.2%
7 908
 
1.6%
Connector Punctuation
ValueCountFrequency (%)
_ 22463
100.0%
Other Punctuation
ValueCountFrequency (%)
. 22443
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 102306
60.3%
Latin 67387
39.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 9043
13.4%
T 5565
 
8.3%
U 5200
 
7.7%
S 4996
 
7.4%
E 4583
 
6.8%
R 4029
 
6.0%
H 3631
 
5.4%
L 3528
 
5.2%
C 3242
 
4.8%
G 3139
 
4.7%
Other values (16) 20431
30.3%
Common
ValueCountFrequency (%)
1 28607
28.0%
_ 22463
22.0%
. 22443
21.9%
2 9571
 
9.4%
4 4144
 
4.1%
8 4002
 
3.9%
3 3062
 
3.0%
5 2456
 
2.4%
0 1764
 
1.7%
6 1648
 
1.6%
Other values (2) 2146
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 169693
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 28607
16.9%
_ 22463
13.2%
. 22443
13.2%
2 9571
 
5.6%
A 9043
 
5.3%
T 5565
 
3.3%
U 5200
 
3.1%
S 4996
 
2.9%
E 4583
 
2.7%
4 4144
 
2.4%
Other values (28) 53078
31.3%

level1Name
Text

Missing 

Distinct464
Distinct (%)2.1%
Missing562129
Missing (%)96.2%
Memory size4.5 MiB
2025-01-08T17:55:22.113952image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length31
Median length25
Mean length8.892356319
Min length3

Characters and Unicode

Total characters199749
Distinct characters81
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique111 ?
Unique (%)0.5%

Sample

1st rowWest Virginia
2nd rowSabah
3rd rowBolívar
4th rowAtlántico
5th rowAndhra Pradesh
ValueCountFrequency (%)
oromia 1052
 
3.7%
parwan 995
 
3.5%
alaska 907
 
3.2%
kandahar 663
 
2.3%
morogoro 573
 
2.0%
benshangul-gumaz 547
 
1.9%
la 528
 
1.8%
pará 486
 
1.7%
gambela 475
 
1.7%
peoples 475
 
1.7%
Other values (530) 22076
76.7%
2025-01-08T17:55:22.358437image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 34856
17.4%
r 15095
 
7.6%
o 12869
 
6.4%
n 12568
 
6.3%
i 10334
 
5.2%
e 9945
 
5.0%
l 7814
 
3.9%
s 7349
 
3.7%
6314
 
3.2%
u 6303
 
3.2%
Other values (71) 76302
38.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 160545
80.4%
Uppercase Letter 29921
 
15.0%
Space Separator 6314
 
3.2%
Dash Punctuation 1871
 
0.9%
Other Punctuation 1098
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 34856
21.7%
r 15095
 
9.4%
o 12869
 
8.0%
n 12568
 
7.8%
i 10334
 
6.4%
e 9945
 
6.2%
l 7814
 
4.9%
s 7349
 
4.6%
u 6303
 
3.9%
h 5478
 
3.4%
Other values (35) 37934
23.6%
Uppercase Letter
ValueCountFrequency (%)
C 3472
11.6%
P 3269
10.9%
A 3137
10.5%
M 2361
 
7.9%
B 1881
 
6.3%
K 1834
 
6.1%
S 1762
 
5.9%
T 1704
 
5.7%
G 1516
 
5.1%
N 1511
 
5.0%
Other values (19) 7474
25.0%
Other Punctuation
ValueCountFrequency (%)
. 582
53.0%
' 365
33.2%
/ 55
 
5.0%
! 51
 
4.6%
, 45
 
4.1%
Space Separator
ValueCountFrequency (%)
6314
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1871
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 190466
95.4%
Common 9283
 
4.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 34856
18.3%
r 15095
 
7.9%
o 12869
 
6.8%
n 12568
 
6.6%
i 10334
 
5.4%
e 9945
 
5.2%
l 7814
 
4.1%
s 7349
 
3.9%
u 6303
 
3.3%
h 5478
 
2.9%
Other values (64) 67855
35.6%
Common
ValueCountFrequency (%)
6314
68.0%
- 1871
 
20.2%
. 582
 
6.3%
' 365
 
3.9%
/ 55
 
0.6%
! 51
 
0.5%
, 45
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 197034
98.6%
None 2705
 
1.4%
Latin Ext Additional 10
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 34856
17.7%
r 15095
 
7.7%
o 12869
 
6.5%
n 12568
 
6.4%
i 10334
 
5.2%
e 9945
 
5.0%
l 7814
 
4.0%
s 7349
 
3.7%
6314
 
3.2%
u 6303
 
3.2%
Other values (49) 73587
37.3%
None
ValueCountFrequency (%)
á 1273
47.1%
í 439
 
16.2%
ó 393
 
14.5%
é 150
 
5.5%
ú 102
 
3.8%
č 87
 
3.2%
ð 68
 
2.5%
ö 46
 
1.7%
ț 33
 
1.2%
ş 28
 
1.0%
Other values (10) 86
 
3.2%
Latin Ext Additional
ValueCountFrequency (%)
9
90.0%
1
 
10.0%

level2Gid
Text

Missing 

Distinct1023
Distinct (%)4.7%
Missing562935
Missing (%)96.3%
Memory size4.5 MiB
2025-01-08T17:55:22.556824image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length12
Median length11
Mean length9.903633929
Min length9

Characters and Unicode

Total characters214483
Distinct characters38
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique301 ?
Unique (%)1.4%

Sample

1st rowUSA.49.36_1
2nd rowMYS.13.14_1
3rd rowCOL.6.38_2
4th rowCOL.4.9_2
5th rowIND.2.10_1
ValueCountFrequency (%)
afg.28.1_1 995
 
4.6%
afg.15.3_1 663
 
3.1%
eth.4.2_1 547
 
2.5%
eth.8.3_1 515
 
2.4%
eth.6.1_1 475
 
2.2%
per.8.9_1 473
 
2.2%
tza.14.6_1 457
 
2.1%
bra.14.8_2 452
 
2.1%
eth.8.15_1 341
 
1.6%
tza.20.4_1 306
 
1.4%
Other values (1013) 16433
75.9%
2025-01-08T17:55:22.802720image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 43294
20.2%
1 34220
16.0%
_ 21657
 
10.1%
2 15272
 
7.1%
A 8908
 
4.2%
4 6824
 
3.2%
3 6631
 
3.1%
8 5800
 
2.7%
U 5194
 
2.4%
S 4985
 
2.3%
Other values (28) 61698
28.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 84563
39.4%
Uppercase Letter 64969
30.3%
Other Punctuation 43294
20.2%
Connector Punctuation 21657
 
10.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 8908
13.7%
U 5194
 
8.0%
S 4985
 
7.7%
T 4930
 
7.6%
E 4582
 
7.1%
R 3925
 
6.0%
H 3631
 
5.6%
L 3521
 
5.4%
C 3238
 
5.0%
G 3132
 
4.8%
Other values (16) 18923
29.1%
Decimal Number
ValueCountFrequency (%)
1 34220
40.5%
2 15272
18.1%
4 6824
 
8.1%
3 6631
 
7.8%
8 5800
 
6.9%
5 4425
 
5.2%
6 3541
 
4.2%
0 2843
 
3.4%
7 2655
 
3.1%
9 2352
 
2.8%
Other Punctuation
ValueCountFrequency (%)
. 43294
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 21657
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 149514
69.7%
Latin 64969
30.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 8908
13.7%
U 5194
 
8.0%
S 4985
 
7.7%
T 4930
 
7.6%
E 4582
 
7.1%
R 3925
 
6.0%
H 3631
 
5.6%
L 3521
 
5.4%
C 3238
 
5.0%
G 3132
 
4.8%
Other values (16) 18923
29.1%
Common
ValueCountFrequency (%)
. 43294
29.0%
1 34220
22.9%
_ 21657
14.5%
2 15272
 
10.2%
4 6824
 
4.6%
3 6631
 
4.4%
8 5800
 
3.9%
5 4425
 
3.0%
6 3541
 
2.4%
0 2843
 
1.9%
Other values (2) 5007
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 214483
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 43294
20.2%
1 34220
16.0%
_ 21657
 
10.1%
2 15272
 
7.1%
A 8908
 
4.2%
4 6824
 
3.2%
3 6631
 
3.1%
8 5800
 
2.7%
U 5194
 
2.4%
S 4985
 
2.3%
Other values (28) 61698
28.8%

level2Name
Text

Missing 

Distinct991
Distinct (%)4.6%
Missing563182
Missing (%)96.3%
Memory size4.5 MiB
2025-01-08T17:55:22.975828image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length32
Median length28
Mean length9.567211583
Min length2

Characters and Unicode

Total characters204834
Distinct characters94
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique282 ?
Unique (%)1.3%

Sample

1st rowPendleton
2nd rowPenampang
3rd rowSimití
4th rowManatí
5th rowVisakhapatnam
ValueCountFrequency (%)
bagram 995
 
3.0%
la 771
 
2.3%
rayon 678
 
2.1%
daman 663
 
2.0%
of 606
 
1.8%
kemashi 547
 
1.7%
san 517
 
1.6%
borena 515
 
1.6%
rest 489
 
1.5%
convención 484
 
1.5%
Other values (1170) 26622
80.9%
2025-01-08T17:55:23.209869image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 29287
 
14.3%
o 14241
 
7.0%
n 13826
 
6.7%
e 13296
 
6.5%
i 12566
 
6.1%
r 11826
 
5.8%
11477
 
5.6%
t 8023
 
3.9%
s 6919
 
3.4%
l 6642
 
3.2%
Other values (84) 76731
37.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 161183
78.7%
Uppercase Letter 30130
 
14.7%
Space Separator 11477
 
5.6%
Decimal Number 897
 
0.4%
Other Punctuation 858
 
0.4%
Dash Punctuation 279
 
0.1%
Open Punctuation 10
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 29287
18.2%
o 14241
 
8.8%
n 13826
 
8.6%
e 13296
 
8.2%
i 12566
 
7.8%
r 11826
 
7.3%
t 8023
 
5.0%
s 6919
 
4.3%
l 6642
 
4.1%
g 5977
 
3.7%
Other values (39) 38580
23.9%
Uppercase Letter
ValueCountFrequency (%)
B 3445
11.4%
A 2880
 
9.6%
S 2800
 
9.3%
M 2700
 
9.0%
C 2361
 
7.8%
K 1985
 
6.6%
R 1777
 
5.9%
T 1728
 
5.7%
L 1648
 
5.5%
P 1369
 
4.5%
Other values (19) 7437
24.7%
Decimal Number
ValueCountFrequency (%)
3 297
33.1%
1 220
24.5%
7 175
19.5%
9 125
13.9%
8 38
 
4.2%
0 28
 
3.1%
4 9
 
1.0%
2 3
 
0.3%
5 2
 
0.2%
Other Punctuation
ValueCountFrequency (%)
. 635
74.0%
' 119
 
13.9%
, 56
 
6.5%
/ 48
 
5.6%
Space Separator
ValueCountFrequency (%)
11477
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 279
100.0%
Open Punctuation
ValueCountFrequency (%)
( 10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 191313
93.4%
Common 13521
 
6.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 29287
15.3%
o 14241
 
7.4%
n 13826
 
7.2%
e 13296
 
6.9%
i 12566
 
6.6%
r 11826
 
6.2%
t 8023
 
4.2%
s 6919
 
3.6%
l 6642
 
3.5%
g 5977
 
3.1%
Other values (68) 68710
35.9%
Common
ValueCountFrequency (%)
11477
84.9%
. 635
 
4.7%
3 297
 
2.2%
- 279
 
2.1%
1 220
 
1.6%
7 175
 
1.3%
9 125
 
0.9%
' 119
 
0.9%
, 56
 
0.4%
/ 48
 
0.4%
Other values (6) 90
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 202086
98.7%
None 2739
 
1.3%
Latin Ext Additional 9
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 29287
 
14.5%
o 14241
 
7.0%
n 13826
 
6.8%
e 13296
 
6.6%
i 12566
 
6.2%
r 11826
 
5.9%
11477
 
5.7%
t 8023
 
4.0%
s 6919
 
3.4%
l 6642
 
3.3%
Other values (58) 73983
36.6%
None
ValueCountFrequency (%)
í 905
33.0%
á 755
27.6%
ó 654
23.9%
é 138
 
5.0%
ð 57
 
2.1%
ú 52
 
1.9%
ñ 48
 
1.8%
â 30
 
1.1%
É 24
 
0.9%
æ 18
 
0.7%
Other values (12) 58
 
2.1%
Latin Ext Additional
ValueCountFrequency (%)
4
44.4%
3
33.3%
1
 
11.1%
1
 
11.1%

level3Gid
Text

Missing 

Distinct468
Distinct (%)5.1%
Missing575359
Missing (%)98.4%
Memory size4.5 MiB
2025-01-08T17:55:23.404719image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length14
Median length13
Mean length11.89450883
Min length11

Characters and Unicode

Total characters109822
Distinct characters34
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique169 ?
Unique (%)1.8%

Sample

1st rowIND.2.10.3_1
2nd rowRUS.34.42.1_1
3rd rowTZA.9.4.11_1
4th rowGRC.6.2.16_1
5th rowETH.8.3.1_1
ValueCountFrequency (%)
eth.4.2.2_1 547
 
5.9%
eth.8.3.1_1 499
 
5.4%
eth.6.1.3_1 464
 
5.0%
tza.14.6.4_1 457
 
4.9%
per.8.9.7_1 329
 
3.6%
tza.20.4.4_1 306
 
3.3%
eth.2.3.6_1 289
 
3.1%
eth.8.15.11_1 277
 
3.0%
ind.31.22.2_1 228
 
2.5%
tza.9.4.11_1 203
 
2.2%
Other values (458) 5634
61.0%
2025-01-08T17:55:23.659488image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 27699
25.2%
1 19645
17.9%
_ 9233
 
8.4%
2 5321
 
4.8%
T 4830
 
4.4%
4 4335
 
3.9%
3 4012
 
3.7%
H 3611
 
3.3%
E 3546
 
3.2%
8 2860
 
2.6%
Other values (24) 24730
22.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 45193
41.2%
Other Punctuation 27699
25.2%
Uppercase Letter 27697
25.2%
Connector Punctuation 9233
 
8.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 4830
17.4%
H 3611
13.0%
E 3546
12.8%
A 2746
9.9%
R 2293
8.3%
Z 2007
7.2%
P 1580
 
5.7%
N 1224
 
4.4%
U 888
 
3.2%
S 849
 
3.1%
Other values (12) 4123
14.9%
Decimal Number
ValueCountFrequency (%)
1 19645
43.5%
2 5321
 
11.8%
4 4335
 
9.6%
3 4012
 
8.9%
8 2860
 
6.3%
6 2581
 
5.7%
0 1912
 
4.2%
5 1768
 
3.9%
9 1425
 
3.2%
7 1334
 
3.0%
Other Punctuation
ValueCountFrequency (%)
. 27699
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 9233
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 82125
74.8%
Latin 27697
 
25.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 4830
17.4%
H 3611
13.0%
E 3546
12.8%
A 2746
9.9%
R 2293
8.3%
Z 2007
7.2%
P 1580
 
5.7%
N 1224
 
4.4%
U 888
 
3.2%
S 849
 
3.1%
Other values (12) 4123
14.9%
Common
ValueCountFrequency (%)
. 27699
33.7%
1 19645
23.9%
_ 9233
 
11.2%
2 5321
 
6.5%
4 4335
 
5.3%
3 4012
 
4.9%
8 2860
 
3.5%
6 2581
 
3.1%
0 1912
 
2.3%
5 1768
 
2.2%
Other values (2) 2759
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 109822
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 27699
25.2%
1 19645
17.9%
_ 9233
 
8.4%
2 5321
 
4.8%
T 4830
 
4.4%
4 4335
 
3.9%
3 4012
 
3.7%
H 3611
 
3.3%
E 3546
 
3.2%
8 2860
 
2.6%
Other values (24) 24730
22.5%

level3Name
Text

Missing 

Distinct441
Distinct (%)5.4%
Missing576369
Missing (%)98.6%
Memory size4.5 MiB
2025-01-08T17:55:23.837511image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length30
Median length24
Mean length8.994041104
Min length3

Characters and Unicode

Total characters73958
Distinct characters85
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique158 ?
Unique (%)1.9%

Sample

1st rowChintapalle
2nd rowKwakoa
3rd rowParanesti
4th rowAbaya
5th rowBio Jiganifado
ValueCountFrequency (%)
bio 547
 
4.8%
jiganifado 547
 
4.8%
abaya 499
 
4.4%
zuria 483
 
4.3%
gambela 464
 
4.1%
hembeti 457
 
4.0%
quimbiri 329
 
2.9%
kisarawe 306
 
2.7%
gewane 289
 
2.5%
lome 277
 
2.4%
Other values (560) 7143
63.0%
2025-01-08T17:55:24.077929image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 11829
16.0%
i 6915
 
9.3%
e 5698
 
7.7%
o 3716
 
5.0%
n 3653
 
4.9%
3118
 
4.2%
r 3027
 
4.1%
m 2665
 
3.6%
u 2447
 
3.3%
b 2389
 
3.2%
Other values (75) 28501
38.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 58838
79.6%
Uppercase Letter 10778
 
14.6%
Space Separator 3118
 
4.2%
Decimal Number 463
 
0.6%
Other Punctuation 400
 
0.5%
Open Punctuation 158
 
0.2%
Close Punctuation 158
 
0.2%
Dash Punctuation 45
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 11829
20.1%
i 6915
11.8%
e 5698
9.7%
o 3716
 
6.3%
n 3653
 
6.2%
r 3027
 
5.1%
m 2665
 
4.5%
u 2447
 
4.2%
b 2389
 
4.1%
t 2369
 
4.0%
Other values (31) 14130
24.0%
Uppercase Letter
ValueCountFrequency (%)
A 998
 
9.3%
K 921
 
8.5%
B 904
 
8.4%
G 846
 
7.8%
L 711
 
6.6%
H 671
 
6.2%
J 566
 
5.3%
N 547
 
5.1%
Z 518
 
4.8%
C 504
 
4.7%
Other values (16) 3592
33.3%
Decimal Number
ValueCountFrequency (%)
4 112
24.2%
3 98
21.2%
7 95
20.5%
1 62
13.4%
2 32
 
6.9%
0 23
 
5.0%
5 16
 
3.5%
9 13
 
2.8%
6 11
 
2.4%
8 1
 
0.2%
Other Punctuation
ValueCountFrequency (%)
. 306
76.5%
, 80
 
20.0%
' 13
 
3.2%
/ 1
 
0.2%
Space Separator
ValueCountFrequency (%)
3118
100.0%
Open Punctuation
ValueCountFrequency (%)
( 158
100.0%
Close Punctuation
ValueCountFrequency (%)
) 158
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 45
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 69616
94.1%
Common 4342
 
5.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 11829
17.0%
i 6915
 
9.9%
e 5698
 
8.2%
o 3716
 
5.3%
n 3653
 
5.2%
r 3027
 
4.3%
m 2665
 
3.8%
u 2447
 
3.5%
b 2389
 
3.4%
t 2369
 
3.4%
Other values (57) 24908
35.8%
Common
ValueCountFrequency (%)
3118
71.8%
. 306
 
7.0%
( 158
 
3.6%
) 158
 
3.6%
4 112
 
2.6%
3 98
 
2.3%
7 95
 
2.2%
, 80
 
1.8%
1 62
 
1.4%
- 45
 
1.0%
Other values (8) 110
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 73811
99.8%
None 135
 
0.2%
Latin Ext Additional 12
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 11829
16.0%
i 6915
 
9.4%
e 5698
 
7.7%
o 3716
 
5.0%
n 3653
 
4.9%
3118
 
4.2%
r 3027
 
4.1%
m 2665
 
3.6%
u 2447
 
3.3%
b 2389
 
3.2%
Other values (60) 28354
38.4%
None
ValueCountFrequency (%)
í 46
34.1%
â 21
15.6%
ó 16
 
11.9%
è 10
 
7.4%
ê 9
 
6.7%
ñ 9
 
6.7%
á 9
 
6.7%
ơ 7
 
5.2%
ü 4
 
3.0%
ư 4
 
3.0%
Latin Ext Additional
ValueCountFrequency (%)
3
25.0%
ế 3
25.0%
3
25.0%
2
16.7%
1
 
8.3%

iucnRedListCategory
Text

Missing 

Distinct9
Distinct (%)< 0.1%
Missing273793
Missing (%)46.8%
Memory size4.5 MiB
2025-01-08T17:55:24.130870image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters621598
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLC
2nd rowLC
3rd rowLC
4th rowLC
5th rowLC
ValueCountFrequency (%)
lc 259391
83.5%
ne 22703
 
7.3%
nt 14823
 
4.8%
vu 8832
 
2.8%
en 3006
 
1.0%
cr 1367
 
0.4%
ex 575
 
0.2%
dd 71
 
< 0.1%
ew 31
 
< 0.1%
2025-01-08T17:55:24.226866image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 260758
41.9%
L 259391
41.7%
N 40532
 
6.5%
E 26315
 
4.2%
T 14823
 
2.4%
V 8832
 
1.4%
U 8832
 
1.4%
R 1367
 
0.2%
X 575
 
0.1%
D 142
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 621598
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 260758
41.9%
L 259391
41.7%
N 40532
 
6.5%
E 26315
 
4.2%
T 14823
 
2.4%
V 8832
 
1.4%
U 8832
 
1.4%
R 1367
 
0.2%
X 575
 
0.1%
D 142
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 621598
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 260758
41.9%
L 259391
41.7%
N 40532
 
6.5%
E 26315
 
4.2%
T 14823
 
2.4%
V 8832
 
1.4%
U 8832
 
1.4%
R 1367
 
0.2%
X 575
 
0.1%
D 142
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 621598
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 260758
41.9%
L 259391
41.7%
N 40532
 
6.5%
E 26315
 
4.2%
T 14823
 
2.4%
V 8832
 
1.4%
U 8832
 
1.4%
R 1367
 
0.2%
X 575
 
0.1%
D 142
 
< 0.1%